The Research On Query Optimization Technology Based On Big Data Platform

Posted on:2019-06-25

Degree:Master

Type:Thesis

Country:China

Candidate:P X Fei

Full Text:PDF

GTID:2428330593950478

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of Internet technology and Internet of things technology,the data volume of all walks of life is growing explosively.Massive data contains tremendous value.In order to efficiently extract valuable information from massive data,statistical analysis of massive data is required.In conventional massive data applications,SQL-like query statements are usually used to complete statistical analysis of massive data.The statistical analysis of massive data mainly has two scenarios,statistical analysis of static dataset and statistical analysis of incremental data set.For statistical analysis of static datasets,it is common to run a batch of queries at once.These queries have three characteristics.Each query covers a different query area.There are a large number of common query areas between multiple queries and queries are concentrated in some hot query areas.However,the existing query optimization methods are very simple and do not make good use of the characteristics of the query.It ignores the multiplexing relationship between multiple queries,which causes redundant computation and affects computation efficiency.For statistical analysis of incremental dataset,a batch of query statements is usually run periodically.As the dataset is continuously expanding,the previous computation results are out of date,so the dataset needs to be recalculate every time.This paper focuses on multiple range query optimization problem and incremental computation optimization problem,proposing optimization methods.These optimization methods can effectively reduce redundant computation and improve the efficiency of computation.The work of this paper is summarized as follows:(1)In order to improve the efficiency of multiple range queries processing,this paper makes use of the characteristics of multiple range queries and designs a MapReduce materialization strategy based on region clustering.This materialization strategy makes good use of the region overlap between region queries and the region aggregation of query set.The basic idea of this materialization strategy is region clustering.Firstly,the query space is divided,and then the clustering method based on greedy strategy proposed in this paper is used to complete the query clustering.Multiple queries with a high degree of similarity are divided into one group and each group produces a materialized view.The method proposed in this paper can avoid the malignant expansion of the materialized view,making query processing in a small region.Finally,a corresponding verification experiment was conducted.The experimental results show that the proposed method can effectively improve the efficiency of the query set.(2)In order to improve the efficiency of incremental query,this paper proposes an incremental computation reuse model based on the combinable operator and designs and implements an incremental query optimization system based on Spark platform.The main idea is to obtain the final computation result by merging the historical computation result and the computation result of the newly added data.In order to reuse the previous computation results,this paper defines the metadata of cache and designs a computational matching method.This paper establishes a cache reuse cost model to evaluate the reuse cost of the cache.In order to rewrite the execution plan to generate an incremental execution plan,this paper proposes execution plan rewriting rules.In order to achieve incremental query optimization,this article also extends Spark.Since the classic cache replacement algorithm is not suitable for this system,this paper also proposes a cache replacement algorithm based on the integrated value.Finally,the experiment verifies that the incremental query optimization system proposed in this paper can effectively improve the incremental query efficiency.

Keywords/Search Tags:

Multiple Query Optimization, Regional Clustering, Incremental Computation, Computation Reuse

PDF Full Text Request

Related items

1	The Research Of Key Techniques Of Incremental Computing For DAG- Based Framework
2	Study And Implementation Of View Incremental Computation Method In Information System
3	An Optimization Mechanism For Asynchronous Incremental Computation On Dynamic Graph Processing
4	Research On Secure Multiparty Computation Of Set-Related Problems
5	Research On Computation Offloading In MEC Networks Using NOMA
6	Research Of Mind Evolutionary Computation Multi-modal Optimization Performance And Of Mind Evolutionary Computation Parameters Effecting Efficiency
7	Research On Performance Optimization For Distributed Graph Computation
8	Efficient And Secure Verifiable Outsourced Computation For Multiple Source Data
9	Research On GPU-based Computation Method For Line-of-sight Queries
10	Performance Optimization Of Distributed Graph Computation Framework Based On BSP Model