Font Size: a A A

Research And Implementation Of The Aggregate-Join Query Optimization Approach Based On Mapreduce

Posted on:2014-05-15Degree:MasterType:Thesis
Country:ChinaCandidate:T T LiFull Text:PDF
GTID:2308330473453727Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In recent years, with the application of computer technology in fields like the internet, sensors and scientific data analysis, data grows so fast that traditional relational database cannot qualify the analysis tasks on massive data any more. Under this circumstance, many management and analysis technologies aiming at massive data come into being, among which the distributed file systems of Google GFS and Hadoop HDFS and MapReduce distributed parallel programming model are the most popular ones in academia and industry. MapReduce follows the principle of migrating computing rather than data, being able to process massive data in parallel on quantities of computers with ordinary configuration. In addition, an enterprise’s competitiveness depends largely on its data mining efficiency in today’s society. And among varieties of data analysis operations, aggregate-join query is the most essential and common one. Therefore, it is necessary to study the aggregate-join query optimization approaches based on MapReduce.This thesis firstly proposes the aggregate-join query optimization goals through analysis of the realistic requirements, and gives the energy efficiency evaluation model taking both energy consumption and performance into consideration based on the optimization goals. Then, this thesis studies the optimization approaches from three levels around the optimization goals:From the query algorithm level, proposes six algorithms falling into three kinds; From the load balancing level, proposes specific optimization model, and provides a solution in detail; From the execution plan level, proposes a cost prediction model, and gives a feasible optimal plan selecting strategy from the six candidate algorithms. Finally, this thesis verifies the correctness of the models and the effectiveness of the optimization approaches proposed through sufficient experiments.Experimental results show that the proposed algorithms are feasible which can effectively finish the query tasks and are suitable for different query scenarios, that the proposed load balancing optimization approach is effective which can improve the algorithm performance and decrease the energy consumption at a certain extent, that the proposed cost prediction model is correct which can accurately evaluate the algorithm cost and select the optimal execution plan, that the proposed energy efficiency evaluation model is rational which can make a good trade-off between energy consumption and performance.
Keywords/Search Tags:Aggregate-Join Query, MapReduce, Energy Optimization, Query Optimization, Evaluation Model
PDF Full Text Request
Related items