Performance Prediction and Resource Bricolage for Database Systems

Posted on:2015-05-29

Degree:Ph.D

Type:Dissertation

University:The University of Wisconsin - Madison

Candidate:Li, Jiexing

Full Text:PDF

GTID:1478390017489623

Subject:Computer Science

Abstract/Summary:

With the growth of the Internet, our ability to generate extremely large amounts of data has dramatically increased. This sheer volume of data that needs to be managed and analyzed has led to the wide adoption of very large and complex data management systems. Although these systems can significantly reduce data processing time, issues such as hardware/software skew, resource contention, and failures are more likely to arise. All large and complex systems have to face this unwanted but inevitable fact. Due to all these issues, it gets harder to anticipate the future state of a system, and a one-time decision model used by schedulers, optimizers or resource managers will be vulnerable to state changes.;Meanwhile, running parallel database systems in an environment with heterogeneous resources has become increasingly common, due to cluster evolution and increasing interest in moving applications into public clouds. Very large data processing is increasingly becoming a necessity for modern applications. For database systems running in a heterogeneous cluster, the default data partitioning strategy may overload some of the slow machine while at the same time it may under-utilize the more powerful machines. Since the processing time of a parallel query is determined by the slowest machine, such an allocation strategy may result in significant query performance degradation.;It is not uncommon today for us to decide which computing resources should be used to build a cluster or to run an application from a diverse range of such resources. Very often, when a new cluster is built or an old cluster is upgraded, there are various machines, low-end or high-end, that we can choose from. Different choices may lead to different costs or performance. Thus, we will encounter a resource selection problem if we have a limited budget or a performance goal.;This dissertation makes three contributions by addressing these three problems: query progress estimation, data allocation , and resource selection˙ The first contribution is the design and implementation of a new cost-based query progress indicator, called GSLPI, to produce more accurate progress estimates. The second contribution is a new technique we call resource bricolage that provides a recommended data partitioning scheme to minimize workload execution time in heterogeneous environments. The third contribution is the formalization and solutions for two resource bricolage problems with either a budget constraint or a time constraint. We show that the solution combining both data allocation and resource selection can achieve significant performance improvement over other alternatives.;This dissertation provides a new vision of deploying performance prediction technology in the areas of query optimization, scheduling, and execution, and it also points to promising directions for future studies to improve database performance running in the cloud.

Keywords/Search Tags:

Data, Performance, Resource, Systems, Large

Related items

1	Scheduling and resource management for complex systems: From large-scale distributed systems to very large sensor networks
2	Design issues for large-scale distributed systems: Data and resource managements
3	Confidentiality Protection of User Data and Adaptive Resource Allocation for Managing Multiple Workflow Performance in Service-based Systems
4	Design of real-time virtual resource architecture for large-scale embedded systems
5	Research On Performance Optimization Of Large Scale Elastic Resource In IaaS Cloud Computing
6	The Research Of Iaas’s Influence On Enterprise Performance In Large Companies
7	Big Data Analytics Performance for Large Out-of-Core Matrix Solvers on Advanced Hybrid Architectures
8	Performance Tuning In Finacial Applications With Large Volumn Of Data
9	More effective use of high performance systems using sub-batch allocation resource management within multiple component multiple data applications
10	Resource-allocation for OFDMA relay systems