Font Size: a A A

Multi-Level Dynamic Programming Connected Subset Complement-Pair Algorithm System For Query With Multiple Data Sources And Data Centers

Posted on:2019-08-22Degree:MasterType:Thesis
Country:ChinaCandidate:N N HaoFull Text:PDF
GTID:2370330590451768Subject:Logistics engineering
Abstract/Summary:PDF Full Text Request
With the amount of data generated by human society increasing explosively,the application of big data can be observed everywhere.The response speed of the query directly affects the working efficiency of the large data practitioner,which can affect the social development.Therefore,it is very important to optimize the query and shorten the execution time of it.In query optimization,factors such as join order,join site,and join method are involved and each factor affects each other.Therefore,it is of great theoretical and practical significance to establish a clear model to describe and evaluate the query execution process and to design an efficient query optimization algorithm for the latest distributed database systems with multiple data sources and data centers.In this paper,the query optimization problem in distributed database system with multiple data sources and data centers is studied,and a query execution plan description model based on binary tree is established.The model can describe the specific process of query execution.In addition,this paper also designs a set of query execution cost calculation formula framework,through this framework and supplemented by appropriate evaluation formulas,the query execution plan can be accurately evaluated.This paper designs three algorithms: Multi-Level Dynamic Programming connected subset complement-pair algorithm(MLDPccp),Limited Multi-Level Dynamic Programming connected subset complement-pair algorithm(LMLDPccp)and Improved Join Graph Based Greedy algorithm(IJGBG).In order to compare the effect of these algorithms,a Random Left-Deep Tree algorithm(RLDT)and a traditional Join Graph Based Greedy algorithm(JGBG)are presented.In MLDPccp,the join graph is divided into several connected subgraph.Each subgraph is solved by dynamic programming and the unnecessary crosses products are avoided by enumerate the connected subset complement-pair.In most cases,the algorithm can obtain the optimal solution in the full search space.LMLDPccp is derived from MLDPccp.It reduces the complexity of the algorithm by limiting the search space to find only the optimal solution in the query execution plan with the left deep subtree structure.According to experience,in most cases,the local optimal solution does not have an order of magnitude difference from the global optimal solution.As for the large scale query,IJGBG is improved from JGBG.It is more suitable for distributed database systems with multiple data sources and data centers because it considers the selection factor and transmission speed both.Finally,a large number of numerical experiments are carried out in this paper.The results show that the results of the three algorithms are two to three orders of magnitude better than that of the RLDT algorithm.The results of MLDPccp and LMLDPccp are one order of magnitude better than IJGBG.However,the execution time of the first two increases exponentially with the number of tables involved in the query,while the latter increases linearly.The result of LMLDPccp is about 20% worse than MLDPccp,but the solution scale is enlarged by about 30%.
Keywords/Search Tags:query optimization, join graph, dynamic programming, greedy algorithm
PDF Full Text Request
Related items