Font Size: a A A

OBF-Index:A Distributed Multi-Dimensional Index Based On Ordinal Bloom Filter

Posted on:2018-01-21Degree:MasterType:Thesis
Country:ChinaCandidate:J K LiuFull Text:PDF
GTID:2428330518457954Subject:Software Engineering Technology
Abstract/Summary:PDF Full Text Request
With the amount of data exploded,in order to deal with massive data processing needs,Google proposed MapReduce framework.Hadoop as its open source implementation,more and more people gradually like Hadoop because of its stability,scalability and many other advantages.In the traditional relational database,we can improve the efficiency of retrieval by creating an index,but Hadoop does not support the index structure to improve the efficiency of MapReduce.In a previous study,the laboratory have proposed a lightweight,multi-dimensional index structure BF-MapReduce based on Bloom Filter.BF-MapReduce works before the Map process,it can filter out unnecessary input split,thereby reducing the number of Mapper to achieve the overall efficiency of MapReduce purposes.Bloom Filter is space-efficient,but at the same time,with the data inserted,there will be more and more false positive rate.On this basis,the design of a Bloom Filter variant-Ordinal Bloom Filter,through the hash function serial number and the corresponding insert/find algorithm to ensure that the false positive rate is relatively small.In this thesis,Ordinal Bloom Filter as the underlying storage structure of the index,proposed OBF-Index.Compared with BF-MapReduce,this thesis focuses on the integration of index structure and Hadoop,and designs and implements a number of services related to index building,updating,using and optimizing.And the concept of index environment profile is put forward,and the index work environment,creation parameters and expected performance are described formally.The index analyzer constructs the machine learning model with the index environment profile as the analysis object,and realizes the automatic analysis and optimization of the index.Finally,the performance of OBF-Index and original MapReduce,Hive and BF-MapReduce are compared by comparing experiments.The experimental results show that the OBF-Index retains the BF-MapReduce lightweight and efficient features,and can improve the performance of the MapReduce program(especially the retrieval program)under the large-scale data set.At the same time,because of the flexibility of the index environment profile,you can construct different and effective indexes for different application scenarios,and then improve the utilization rate of the whole cluster.
Keywords/Search Tags:MapReduce, OBF-Index, Bloom Filter, Index Environment, Multidimensional Index
PDF Full Text Request
Related items