Font Size: a A A

Research On The Key Problems Of Big Data Index Technology

Posted on:2017-02-25Degree:MasterType:Thesis
Country:ChinaCandidate:W T GanFull Text:PDF
GTID:2348330485981327Subject:Systems analysis and integration
Abstract/Summary:PDF Full Text Request
In recent years,Rapid development of the technology drives every walk of life to stride forward information,especially the development of scientific research,Internet,E-commerce and so on,the amount of data at an unprecedented rate of growth in mass,and the scale of data center is expanding at an alarming rate.How to manage the large data effectively and improve the ability of query and analysis of big data is a hot research topic in industry and academia.Indexing technology is an effective way to improve data query,the fundamental changes in the large data storage way,making the original traditional relational database in mature indexing techniques cannot be applied directly in the massive data processing,data mass and complexity,high data indexing mechanism must meet a variety of query support,support for efficient retrieval and easy maintenance requirements.In order to solve the problem of large data query processing,the need to establish a new index structure for large data environment.This paper presents a Hadoop based on position coding index(location Bitcode tree tree(LB-Tree),using the MapReduce programming model in dealing with the advantages of large-scale data,according to the characteristics of KNN queries,the optimization strategy of MapReduce framework of data storage,similar resources are stored separately,reached in the query process,the maximum improve MapReduce parallel.First,the cluster of huge amounts of data,then according to the distribution of data clustering,by taking the centroid as the center of clustering in the data object of concentric layers,and each layer adopts the binary code of the different length of to express,all data of the object code is organized into a tree index structure,relationship of frequent queries data search path,to query using index structure fast determine the search space,so as to improve the efficiency of data retrieval.This paper verifies the effectiveness and accuracy of the proposed algorithm and method by theoretical analysis and experiments.The experimental results show that the query efficiency of Hadoop based on Location Bitcode Tree under the proposed KNN is significantly improved and has good scalability.
Keywords/Search Tags:big data, index, KNN search, MapReduce
PDF Full Text Request
Related items