Font Size: a A A

A Distributed Index Research Based On B~+-Tree In Parallel Data Warehouses

Posted on:2011-10-18Degree:MasterType:Thesis
Country:ChinaCandidate:J T ZhangFull Text:PDF
GTID:2178360302494888Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Along with the continuously extending of computer application field, the number of data become more and more large, and the search operation become more and more complicated. The distributed index gradually become the valid means of resolving this complicated problem because its high performance. Even in the research of Ubiqutous Computing, Grid Computing, Data Mining, and Data Warehouse, the distributed index is used widely. Based on these, the distributed index based on B~+-Tree in the parallel data warehouse is studied.At first, in order to suitable for the distributed and parallel environment, we present a new distributed index tree named RDB~+-Tree. The index is based on the B~+-Tree and Hash List. According to the RDB~+-Tree, we introduce a copies assignment strategy. Then uses version control and time control method, we present a concurrency control algorithm named VTC-RDB~+ algorithm. It not only can achieve the purpose of concurrency control, but also can solve the serious delay problem of Latch-Coupling algorithm.Secondly, according to the RDB~+-Tree architecture and copies assignment strategy, a novel efficient load balancing strategy, which is based on initializer dynamic change and self-adaptive threshold, is proposed. The strategy is named SRLB. In accordance to the load state of the computer in the distributed environment, and considering the advantage and disadvantage of the Sender and Receiver Initiated Diffusion strategies, our strategy alters dynamically the threshold and initializer strategy. The paper addresses four problems which contain load distribution, load detection, the definition of the load state and initializer strategy, and provides the solution of the problems and structural model.Finally, to evaluate the performance of RDB~+-Tree, we compare with centralized operation method in throughput, response time, resource utilization and load balance.The results demonstrate, when deal with cosmically data, the RDB~+-Tree index method outperforms centralized operation method to shorten response time, balance the load and decrease interior communication flux.
Keywords/Search Tags:Parallel Data warehouse, B~+-Tree, Distributed index, Concurrency control algorithm, Load balance
PDF Full Text Request
Related items