OBF-Index:A Distributed Multi-Dimensional Index Based On Ordinal Bloom Filter

Posted on:2018-01-21

Degree:Master

Type:Thesis

Country:China

Candidate:J K Liu

Full Text:PDF

GTID:2428330518457954

Subject:Software Engineering Technology

Abstract/Summary:

PDF Full Text Request

With the amount of data exploded,in order to deal with massive data processing needs,Google proposed MapReduce framework.Hadoop as its open source implementation,more and more people gradually like Hadoop because of its stability,scalability and many other advantages.In the traditional relational database,we can improve the efficiency of retrieval by creating an index,but Hadoop does not support the index structure to improve the efficiency of MapReduce.In a previous study,the laboratory have proposed a lightweight,multi-dimensional index structure BF-MapReduce based on Bloom Filter.BF-MapReduce works before the Map process,it can filter out unnecessary input split,thereby reducing the number of Mapper to achieve the overall efficiency of MapReduce purposes.Bloom Filter is space-efficient,but at the same time,with the data inserted,there will be more and more false positive rate.On this basis,the design of a Bloom Filter variant-Ordinal Bloom Filter,through the hash function serial number and the corresponding insert/find algorithm to ensure that the false positive rate is relatively small.In this thesis,Ordinal Bloom Filter as the underlying storage structure of the index,proposed OBF-Index.Compared with BF-MapReduce,this thesis focuses on the integration of index structure and Hadoop,and designs and implements a number of services related to index building,updating,using and optimizing.And the concept of index environment profile is put forward,and the index work environment,creation parameters and expected performance are described formally.The index analyzer constructs the machine learning model with the index environment profile as the analysis object,and realizes the automatic analysis and optimization of the index.Finally,the performance of OBF-Index and original MapReduce,Hive and BF-MapReduce are compared by comparing experiments.The experimental results show that the OBF-Index retains the BF-MapReduce lightweight and efficient features,and can improve the performance of the MapReduce program(especially the retrieval program)under the large-scale data set.At the same time,because of the flexibility of the index environment profile,you can construct different and effective indexes for different application scenarios,and then improve the utilization rate of the whole cluster.

Keywords/Search Tags:

MapReduce, OBF-Index, Bloom Filter, Index Environment, Multidimensional Index

PDF Full Text Request

Related items

1	Study And Implementation On Distributed Hash Index Structure In P2P Environments
2	Study And Implementation On Distributed Hash Index Structure In P2p Environments
3	Multidimensional Data Index Structure Cloud Environment
4	Parallel Search On Ciphertext Based On Index In Cloud Computing
5	A generalized multidimensional index structure for multimedia data to support content-based similarity searches in a collaborative search environment
6	Research On Multidimensional Cloud Data Index Structure Based On KD Tree And R Tree
7	Design And Implementation Of HBase Hierarchical Auxiliary Index System
8	Research On Data Index Application In The MapReduce Framework
9	A Study And Implementation Of Scalable Data Index Based On Mapreduce
10	A New Index To Evaluate The Researcher’s Impact Based On A Series Points On The Citation Curve