MS, University of Cincinnati, 2014, Engineering and Applied Science: Computer Science
A main purpose of a database is to provide requested data efficiently. Query performance can be improved in many ways. One of the efficient ways to handle multiple queries posted simultaneously to the database is to distribute the database across several sites and instead of querying the entire database, only the site that contains the data related to the query is accessed. Distribution of a database involves fragmentation of the data and allocating the fragmented data across various sites. Several research works address the issue of fragmentation of databases based on workload, since the aim of fragmentation is to optimize query response time [MD08]. In particular, clustering the data according to query predicates or attributes is shown to perform well for fragmentation. Mahboubi and Darmont propose the use of a k-means based fragmentation approach [MD08]. The authors do not consider the similarity of query predicates in the workload before performing the k-means clustering in their approach. We cluster similar selection predicates involved in the workload as a pre-processing step for the fragmentation; we expect to further improve the query performance. We investigate clustering techniques and study the resulting performance for a selected case study. We conclude that in general for our workloads and for our experimental parameters, the final clusters obtained using our predicate preprocessing system are tighter and more meaningful. As the number of similar values in the workload decreases, the relative savings of the predicate preprocessing system is reduced. If there are no similar values in the workload, the original fragmentation system is more efficient.
Committee: Karen Davis Ph.D. (Committee Chair); Raj Bhatnagar Ph.D. (Committee Member); Carla Purdy Ph.D. (Committee Member)
Subjects: Computer Science