Font Size: a A A

Performance Prediction and Resource Bricolage for Database Systems

Posted on:2015-05-29Degree:Ph.DType:Dissertation
University:The University of Wisconsin - MadisonCandidate:Li, JiexingFull Text:PDF
GTID:1478390017489623Subject:Computer Science
Abstract/Summary:
With the growth of the Internet, our ability to generate extremely large amounts of data has dramatically increased. This sheer volume of data that needs to be managed and analyzed has led to the wide adoption of very large and complex data management systems. Although these systems can significantly reduce data processing time, issues such as hardware/software skew, resource contention, and failures are more likely to arise. All large and complex systems have to face this unwanted but inevitable fact. Due to all these issues, it gets harder to anticipate the future state of a system, and a one-time decision model used by schedulers, optimizers or resource managers will be vulnerable to state changes.;Meanwhile, running parallel database systems in an environment with heterogeneous resources has become increasingly common, due to cluster evolution and increasing interest in moving applications into public clouds. Very large data processing is increasingly becoming a necessity for modern applications. For database systems running in a heterogeneous cluster, the default data partitioning strategy may overload some of the slow machine while at the same time it may under-utilize the more powerful machines. Since the processing time of a parallel query is determined by the slowest machine, such an allocation strategy may result in significant query performance degradation.;It is not uncommon today for us to decide which computing resources should be used to build a cluster or to run an application from a diverse range of such resources. Very often, when a new cluster is built or an old cluster is upgraded, there are various machines, low-end or high-end, that we can choose from. Different choices may lead to different costs or performance. Thus, we will encounter a resource selection problem if we have a limited budget or a performance goal.;This dissertation makes three contributions by addressing these three problems: query progress estimation, data allocation , and resource selection˙ The first contribution is the design and implementation of a new cost-based query progress indicator, called GSLPI, to produce more accurate progress estimates. The second contribution is a new technique we call resource bricolage that provides a recommended data partitioning scheme to minimize workload execution time in heterogeneous environments. The third contribution is the formalization and solutions for two resource bricolage problems with either a budget constraint or a time constraint. We show that the solution combining both data allocation and resource selection can achieve significant performance improvement over other alternatives.;This dissertation provides a new vision of deploying performance prediction technology in the areas of query optimization, scheduling, and execution, and it also points to promising directions for future studies to improve database performance running in the cloud.
Keywords/Search Tags:Data, Performance, Resource, Systems, Large
Related items