Font Size: a A A

Research On Multidimensional Data Index In Cloud Computing System

Posted on:2017-03-17Degree:DoctorType:Dissertation
Country:ChinaCandidate:J HeFull Text:PDF
GTID:1108330485988406Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
The rapid development of cloud computing has enhanced the explosive growing of data sets, which challenges the traditional data managing technology. Most existing cloud storage systems adopt distributed hash table to store and retrieve data, where the key-value-based model achieves high access throughputs under single-point queries. However, this model is not efficient under range queries and multi-dimensional queries. If users want to submit queries with multi-attributed columns or arrays, it needs to start MapReduce to scan the entire data set, due to lacking of the efficient secondary indexing system. Therefore, multi-dimensional data indexing technology on the cloud environment becomes a demanding topic in recent years. Many related research achievements have been published in top-ranked conferences and research journals.This thesis focuses on three major results: the research of multidimensional data indexing associated with MapReduce model, the research of two-layer multidimensional data indexing based on client-server architecture, and the research of multidimensional data indexing with dynamic dimensional expending under pure distributive environment. The main contributions of this thesis are listed as follows.1. Most existing cloud computing systems support single-point queries with high performance, but lack of support for multi-dimensional queries. This thesis proposed CloudUB as a UB-tree based multi-dimensional cloud index framework. It utilizes the Z-Curve to realize the reducing of multi-dimensions. And the Z-space partition can efficiently organize the multidimensional data on B+ tree. This implementing can phase out the data spaces that do not included in the expected search spaces during the multidimensional queries. The creating and maintaining mechanism of HBase-based CloudUB index were designed, which provides online and offline construction algorithm. This mechanism stores the leaf nodes of B+ tree to HBase, and converts the original multi-dimensional queries to current well supported key-value queries. Accordingly, the CloudUB can support high concurrent index table accessing throughput with MapReduce techniques. The experimental results, which is carried out under Hadoop v2.2 with 10 millions test data, showed that the indexing design supports the construction of flexible, efficient and real-time indexing table and improves the efficiency of multidimensional searching.2. By the intensive study of data management methods of cloud computing, the KD-R, a two-layer multidimensional indexing system was proposed. The indexing method is to construct an R-tree index for the local data in each data server on the cloud computing system. All the local data R-trees form the bottom-layer of the two-layer indexing system. Then, information of R-tree in different local nodes is sent to the global sever to construct a unified KD-tree index, which forms the upper-layer of the two-layer system. At the same time, the algorithm for multidimensional search based on the two-layer indexing system was designed. A self-adapt algorithm is also proposed to ensure the data synchronization between the bottom and upper layer, and a model for measuring the cost of the node performance for the previous algorithm is presented.3. Since there exist flexible demands and extended dynamic requirements, we creat a two-layer multidimensional indexing system, called CB-index, based on Chord overlay network and partition-based bitmap techniques, which can dynamically expand dimension under the pure distributive environment. This framework can reduce the cost of reconstructing the original index. The proposed indexing system includes the local indexing algorithm, a partition-based bitmap indexing. By the compression of bit encoding, the algorithm effectively optimized the spatial efficiency of the indexing system. On the other hand, the indexing system algorithm for the global server uses Chord to cover the network, avoiding the bottleneck problem caused by the client-server architecture in the global server. Based on the study of the prefix extension encoding of the partition-based bitmap, the research attempts to achieve the dynamic index dimension expanding. Meanwhile, the research was aimed to optimize the efficiency of the multidimensional search through the design of layered bitmap for the indexing system. The results of the experiments showed that the designed indexing system is able to achieve multidimensional search and flexible dimensional expansion with high-efficiency. It features relatively high spatial efficiency and improved expandability.
Keywords/Search Tags:cloud computing, cloud storage, cloud data management, multidimensional index, double-layer index
PDF Full Text Request
Related items