Font Size: a A A

Study On The Key Issues Of Database Cluster System

Posted on:2007-10-03Degree:DoctorType:Dissertation
Country:ChinaCandidate:W H GongFull Text:PDF
GTID:1118360242961855Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Presently, the large database management system has become the bottleneck that restricts the performance improvement of the whole information system for the commercial application of on-line transaction processing (OLTP) with massive data. Traditional methods of improving the performance of database system mainly focus on the hardware configurations and the parameter tuning on the stand-alone system, which have many limitations. Therefore, the database cluster based on parallel processing of multiple machines becomes a hot spot. The main objective of the database cluster system is to achieve high performance, high availability and good scalability. And it is widely used in many fields such as high performance computing, the storage and management of massive data, Web services, e-commerce and so on.Around the parallel performance of database cluster system, some key issues in the homogeneous and heterogeneous cluster systems are resolved chiefly from system architecture, concurrency control of global transactions, load balancing and data distribution.In order to realize the large scale and high parallel performance of database cluster system, the techniques of cluster are applied to the database system, and a common middleware system with high parallelism is proposed based on share-nothing database cluster, which provides the architecture of single system image for the clients and realizes the collaboration and parallel execution in the database cluster by using the techniques of the meta data management, the multi-thread mechanism and the parallel transaction pre-processing, and it is well suitable for the high performance requirement of the OLTP commercial application and has an ideal price/performance ratio. The database cluster system not only keeps the autonomy of the local database sites but also improves the parallel performance of the database cluster system and solves the performance bottleneck of large database system.As for the concurrency control, in order to keep the global transactions to be executed concurrently and correctly in the cluster system, a multi-granularity conflict detecting mechanism is proposed on the basis of predicate terms extracted from the global transactions, and further the global deadlocks between the conflicting global transactions are prevented by checking whether there exists an enclosed loop in the predicate conflict graph, which not only decreases the detecting granularity of the deadlocks, but also increases the parallelism of concurrent global transactions and at the same time no other constraints are required to enforce on the local databases. In addition, an improved concurrent scheduling algorithm by means of the committing graph is introduced to keep the serializable committing of the global transactions. The comparison of the experiments shows that the improved scheduling algorithm increases the throughput of the system efficiently and decreases the response time of executing global transactions.The balance of workloads in the homogeneous and heterogeneous cluster is an effective approach to gain high parallel performance and improve the utilization of computing resources. The load balancer of the cluster evaluates the weighted load status of nodes according to the comprehensive utilization of the CPU, Memory and disk I/O resources on the homogeneous or heterogeneous nodes under different workloads. And then a dynamic load balancing algorithm is presented based on thresholds, which considers both the influence of the comprehensive utilization of different computing resources and the influence of the different workloads on the performance of cluster system. The experiment approves that the scheme not only keeps the dynamic balance of the workloads but also makes full use of the computing capabilities of heterogeneous nodes efficiently and improves the utilization of heterogeneous resources.The balance of data distribution in database cluster is an important factor that has an impact on the performance of the system. The existing data partition methods make the global data uniformly distributed on multiple nodes, but without considering the different computing capabilities of heterogeneous nodes, which can't make full use of the parallel processing capabilities of database cluster. Thus, an improved Range data partition scheme is proposed by distributing the data unevenly in database cluster according to the different computing capabilities of homogeneous or heterogeneous nodes, which overcomes the disadvantages of traditional even data partitions. Furthermore, when data skew happens in cluster system, the on line data migrating algorithm is to migrate the hot data from the over-loaded nodes to other under-loaded nodes so as to share the system workloads. The advantages are that it not only avoids the data skew effectively but also improves the throughput and keeps the system balanced dynamically.Lastly, some kinds of metrics are given to evaluate the performance of parallel processing in homogeneous and heterogeneous cluster system. The results of TPC-C testing show that the database cluster system has good scalability, sub-linear speed up, and ideal price/performance ratio for the parallel OLTP. The database cluster system has laid a solid foundation for the large scale OLTP applications in telecom, financial services etc.
Keywords/Search Tags:Database cluster, On-line transaction processing, Concurrency control, Load balancing, Data partition
PDF Full Text Request
Related items