Research On Data Index Application In The MapReduce Framework

Posted on:2016-11-02

Degree:Master

Type:Thesis

Country:China

Candidate:Q N Liu

Full Text:PDF

GTID:2428330482981288

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the development of cloud computing and networking,sensors and microprocessors are widely used in every corner of the earth.Rich data source resulting in the sustained explosive growth of data resource,the complexity of data is also increasing.We have been living in the era of big data.How to manage the vast amounts of data and improve the ability of analyzing massive data effectively is a central issue of the academic research.MapReduce programming model is a key technology of big data.It breaks the operations of large-scale datasets(greater than 1TB)into a number of parallel computations,with parallel processing data across a large number of computing nodes.Based on the MapReduce framework,this paper research data indexing technology as follows:Firstly,this paper analyzes the advantages of parallel computing and processing of data and tasks on large-scale clusters in the MapReduce programming mode.The method to optimize data block partition and data storage based on the MapReduce framework is proposed.The data are uniformly distributed into data blocks based on the principles of relevance and distributed.Secondly,this paper analyzes the traditional indexing technology,high-dimensional indexing technology and indexing technology based on the MapReduce framework.To achieve the purpose of simplifying the search space,approximate vector presentation utilizes a simple vector compactly represents the corresponding high-dimensional vector.One dimensional transformation converters the high-dimensional vector into a one-dimensional representation.The BC-iDistance combines the two techniques and compresses a d-dimension vector as a 2-dimension vector.In this paper,high-dimensional vector compression is based on the BC-iDistance.Distributed index structure with double layers are designed.During searching,three-layer data filtering is realized by using global indexes,local indexes and index values of two-dimensional bitcodes.In this way,both search range and calculation amount of high-dimensional vectors are reduced.Thirdly,the application of massive data index has also been studied in this paper.The problem of "resource overload" has broken out with the era of big data,which brings new challenges to a variety of data query systems.Personalized recommendation system is a common application of artificial intelligence.Parallel query method of personalized recommendation is proposed based on the designed double-layer index.The analysis and clustering of massive Web resources can be finished offline on the basis of data-partition strategy,which improves the efficiency of application.Finally,we verify the validity of the proposed method by experiments.The experiments show that the data partitioning strategy and high-dimensional data index based on MapReduce are very effective and practicable for improving the query efficiency of high-dimensional data.

Keywords/Search Tags:

MapReduce, data index, KNN query, high-dimensional vector, cloud computing

PDF Full Text Request

Related items

1	Study On Indexing And Query Processing Techniques For High Dimensional Data
2	Esearch On Query And Optimization Technology To The Location Based Serrvice Below The Cloud Environment
3	Research On Data Secure Storage And Query Methods For Cloud
4	Query Optimization Based On Mapreduce In The Cloud
5	The Research Of Large-scale Spatial Nearest Neighbor Query In Cloud Environments
6	Research On Optimization Of Map Reduce For Interactive Analysis On Big Data
7	Research And Implementation Of The Big Spatial Data Join Query Processing Algorithms In Cloud Environment
8	Study On High Availability And High Efficienceoptimization Of Mapreduce In Cloud Computing
9	Cloud Computing And A Number Of Data Mining Algorithms Mapreduce Research
10	Performance Optimization And Applications Of MapReduce In Cloud Computing