Font Size: a A A

The Research On Query Optimization Technology Based On Big Data Platform

Posted on:2019-06-25Degree:MasterType:Thesis
Country:ChinaCandidate:P X FeiFull Text:PDF
GTID:2428330593950478Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet technology and Internet of things technology,the data volume of all walks of life is growing explosively.Massive data contains tremendous value.In order to efficiently extract valuable information from massive data,statistical analysis of massive data is required.In conventional massive data applications,SQL-like query statements are usually used to complete statistical analysis of massive data.The statistical analysis of massive data mainly has two scenarios,statistical analysis of static dataset and statistical analysis of incremental data set.For statistical analysis of static datasets,it is common to run a batch of queries at once.These queries have three characteristics.Each query covers a different query area.There are a large number of common query areas between multiple queries and queries are concentrated in some hot query areas.However,the existing query optimization methods are very simple and do not make good use of the characteristics of the query.It ignores the multiplexing relationship between multiple queries,which causes redundant computation and affects computation efficiency.For statistical analysis of incremental dataset,a batch of query statements is usually run periodically.As the dataset is continuously expanding,the previous computation results are out of date,so the dataset needs to be recalculate every time.This paper focuses on multiple range query optimization problem and incremental computation optimization problem,proposing optimization methods.These optimization methods can effectively reduce redundant computation and improve the efficiency of computation.The work of this paper is summarized as follows:(1)In order to improve the efficiency of multiple range queries processing,this paper makes use of the characteristics of multiple range queries and designs a MapReduce materialization strategy based on region clustering.This materialization strategy makes good use of the region overlap between region queries and the region aggregation of query set.The basic idea of this materialization strategy is region clustering.Firstly,the query space is divided,and then the clustering method based on greedy strategy proposed in this paper is used to complete the query clustering.Multiple queries with a high degree of similarity are divided into one group and each group produces a materialized view.The method proposed in this paper can avoid the malignant expansion of the materialized view,making query processing in a small region.Finally,a corresponding verification experiment was conducted.The experimental results show that the proposed method can effectively improve the efficiency of the query set.(2)In order to improve the efficiency of incremental query,this paper proposes an incremental computation reuse model based on the combinable operator and designs and implements an incremental query optimization system based on Spark platform.The main idea is to obtain the final computation result by merging the historical computation result and the computation result of the newly added data.In order to reuse the previous computation results,this paper defines the metadata of cache and designs a computational matching method.This paper establishes a cache reuse cost model to evaluate the reuse cost of the cache.In order to rewrite the execution plan to generate an incremental execution plan,this paper proposes execution plan rewriting rules.In order to achieve incremental query optimization,this article also extends Spark.Since the classic cache replacement algorithm is not suitable for this system,this paper also proposes a cache replacement algorithm based on the integrated value.Finally,the experiment verifies that the incremental query optimization system proposed in this paper can effectively improve the incremental query efficiency.
Keywords/Search Tags:Multiple Query Optimization, Regional Clustering, Incremental Computation, Computation Reuse
PDF Full Text Request
Related items