Studies On Query Processing And Optimization Techniques Based On MapReduce

Posted on:2014-10-19

Degree:Doctor

Type:Dissertation

Country:China

Candidate:L L Ding

Full Text:PDF

GTID:1318330482955725

Subject:Computer system architecture

Abstract/Summary:

PDF Full Text Request

With the rapid development of computer and information technology, the data of Internet is growing like "snowball". Faced with the huge amounts of information, how to process and analyze massive data effectively and obtain the information interesting to the users has become a hot topic of common concern in the industry and academic communities. The query processing and optimization techniques on massive data have gradually become a new research hotspot in the database field. The query processing and optimization techniques of MapReduce framework have especially received the widespread concern, which have far-reaching theoretical significance and important practical application value.This dissertation deeply studies the query processing and optimization techniques of MapReduce framework, enhances the processing performance of MapReduce framework, and improves the processing performance of top-k,kNN, Skyline and join queries. The contributions are summarized as follows:(1) An improved MapReduce framework with lightweight communication mechanisms is proposed, named ComMapReduce, to reduce the communication overhead of the original MapReduce framework by generating the shared information. The ComMapReduce framework can effectively filter the unpromising data in the Map phase, and decreases the input data volume in the Reduce phase. The ComMapReduce framework can substantially enhance the performance of the original MapReduce framework without affecting the basic characteristics of the original MapReduce framework.(2) For the Skyline query, first, we propose a Skyline query processing algorithm on MapReduce. Second, a Skyline query processing algorithm on ComMapReduce framework is presented. This algorithm can efficiently filter the intermediate results not being the final results and can decrease the output of the Map phase by fully taking use of the feature that the amount of the final results of Skyline query is much smaller than the original data. Therefore, the execution efficiency of this algorithm can be enhanced and the network cost can be reduced. Finally, an optimization algorithm is given to further enhance the performance of Skyline query on MapReduce and ComMapReduce framework.(3) For the probabilistic Skyline query over the uncertain data, first, the characteristics and features of probabilistic Skyline query are analyzed and summarized. Second, a two phases processing approach, named filter-refine, is proposed. The approach can convert the non-decomposable probabilistic Skyline query into two decomposable problems, obtaining the global candidate set and the affect set, and computing the final probabilistic Skyline results. The global candidate set and the affect set are gained in the filter phase and the final probabilistic Skyline results are obtained in the refine phase. Finally, the probabilistic Skyline query processing algorithms on MapReduce and ComMapReduce framework are proposed using the filter-refine approach. Numerous unpromising intermediate results can be effectively filtered by the communication strategies of ComMapReduce. The performance of probabilistic Skyline query over uncertain data can be enhanced largely.(4) The join query processing algorithms on MapReduce framework are deeply investigated, containing two ways join query processing algorithm and multi-ways join query processing algorithm. Then, the join query processing algorithms on ComMapReduce framework are proposed. The shared information obtained by the communication mechanisms of ComMapReduce framework can be used to filter the unpromising tuples, avoiding the transmitting and sorting of the unpromising tuples and reducing the processing cost, and then enhancing the efficiency of join query processing algorithms. Finally, the efficient method analyzing the join order is proposed to optimize the processing order of join query, which further improves the performance of join query.

Keywords/Search Tags:

MapReduce, query processing, Hadoop, communication functions

PDF Full Text Request

Related items

1	Study And Implementation On Uncertain Query Processing Technology Using MapReduce
2	Data intensive query processing for Semantic Web data using Hadoop and MapReduce
3	Research On Distributed Data Query Based On Hadoop
4	Research And Application Of Big Data Migration And Query Based-on Hadoop Platform
5	Design And Implementation Of Massive Log Data Quasi-Real-Time Query System Based On Hadoop
6	Design And Implementation Of Distributed Query Algorithm Processing Communication Data Based On Hadoop
7	Research On Big Data Processing System Based On MapReduce Parallel Processing Framework
8	Research On Data Cube Technology Based On MapReduce
9	The Research And Analysis Of Hadoop Small File Processing Method
10	Investigating MapReduce framework extensions for efficient processing of geographically scattered datasets