| With the explosion of data in the past decade, big data is becoming a research hotspot in the information field. Many cloud-based distributed data processing platforms have been proposed to provide efficient and cost effective solutions for big data query processing, such as Hadoop, Hive, Pig, etc. However, most of the current research works focus on improving the performance of query processing based on the view of whole system without considering the features of queries themselves, such as the query similarity, which will cause tremendous redundant computation and reduce the query execution efficiency. What’s more, almost all the existing work just translates the queries into the MapReduce task according to the traditional relational query optimization rules or implementing the query optimization simply by reducing the number of MapReduce task while ignoring the execution features of MapReduce framework, which will have an adverse impact on promotion of the multi-queries processing performance.To solve these problems, in this thesis, we propose a Multi-query optimization framework (Multi-Q) based on MapReduce-oriented cloud environment, which not only utilizes the dependence between multiple queries to take advantage of query results reuse, but also uses the optimal query sub-structures to achieve query structure reuse. Specifically, the thesis covers the following two topics:1) for realizing query results reuse, a cluster-based partition algorithm called CPA has been exploited to conduct the logic partition of the search range of query workload firstly. Then, a Multi-query Reuse Dependence Graph (MRDG) construction method on the basis of the cluster-based partition results has been presented to depict the dependence between the multiple queries. Finally, a Multi-Q processing algorithm based on MRDG has been put forward to achieve the query results reuse and reduce the redundant computation; 2) in order to achieve query structures reuse, firstly, an execution cost model based on MapReduce has been presented to evaluate the execution cost of different phrases of MapReduce thus proposing some optimal query sub structures. Secondly, on the basis of the execution cost model, a query structures reuse optimization algorithm has been designed, which achieves the query structure reuse and reduces query execution cost by embedding the optimal query sub structures into the execution plan. Finally, these two query optimization methods have been synthetically used to improve the overall query processing performance.We evaluate our approach by deploying Multi-Q system based on Hadoop in a real cloud environment, SEU-Cloud, and conducting extensive experiments based on the standard TPC-H dataset. The results verify that Multi-Q system can outperform Hive, while significantly reduce redundant query cost, thus boosting the query processing performance. |