Font Size: a A A

A Study And Implementation Of Scalable Data Index Based On Mapreduce

Posted on:2018-03-05Degree:MasterType:Thesis
Country:ChinaCandidate:J Y ShaoFull Text:PDF
GTID:2348330518498977Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the advent of the information age,massive amounts of data will be produced,various kinds of fields including mobile Internet,Internet of Things and scientific research will produce a large number of complex heterogeneous data.Traditional database technology is maturing at the same time,facing challenges of large amounts of data,complex data structures,scalability and fault tolerance.In recent years,cloud computing technology began to develop rapidly,cloud computing technology with scalability and high availability and other advantages just to make up for the lack of traditional database,and Mapreduce technology proposed by Google is widely used.However,with the deepening of research and practice,some problems of the Mapreduce computational model are gradually found.Mapreduce computational model does not support the data index lead to the cumpute about complex query with low efficiency is one of the problems,and becoming a hot spot in recent years.Although some research has achieved good results in specific areas,their data indexing methods are not scalable and platform independent,because they are committed to solving specific data query problems and rely on Hadoop technology rather than based on Mapreduce itself.In this paper,the distributed data index of Mapreduce model is researched deeply,the research objectives are modeled,the shortcomings of relevant research are summarized,and the difficulty of this work is analyzed.Then a scalable data indexing method is designed and implemented,which is based on general search tree and Raft algorithm.And this method is named MRIndexer with index structure scalability,Efficient query performance,high availability and platform independence.First of all,Design the MRIndexer as a non-intrusive indexing service,which combining the Mapreduce model with the general search tree and,a efficient index establishment,searching and updating maintenance method based on distributed file system and memory is designed and implemented.By exposing the interface to the user,the underlying index structure can be extended so that the user can implement different underlying index structures such as B+ tree and R tree according to the query characteristics.And MRIndexer Provides a method of working with a number of the index structures,so that MRIndexer can meet complex query computing tasks such as multi-dimensional query.Secondly,With the research of Raft algorithm,a message passing mechanism between MRIndexer modules is designed and implemented to ensure the consistency and high availability of data index service.And based on the characteristics of the Mapreduce and the experience of traditional database and distributed computing,this paper optimizes,analyzes and designs an efficient memory management and query optimization method to ensure the MRIndexer can provide high efficiency for a long time stable indexing service.At last,MRIndexer prototype is implemented on the Hadoop and MPI+Phoenix,which achieve R tree and B+ tree index structure.Then,to verify performance,index scalability,high availability and Platform independence of MRIndexer,a variety of experiments,multi-group horizontal and vertical test are designed and carried out.Through the analysis and conclusion of the test results,it is proved that the data indexing method has good effect.
Keywords/Search Tags:Mapreduce, Data Index, Generalized Search Tree, Raft
PDF Full Text Request
Related items