Font Size: a A A

Research On Key Technologies Of Query Optimization With Strong Robustness In Distributed Environment

Posted on:2021-04-26Degree:DoctorType:Dissertation
Country:ChinaCandidate:J T GaoFull Text:PDF
GTID:1528307100974629Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In a distributed environment,join query has always been the performance bottleneck of the database system,and query optimization is the main way to overcome this bottleneck.When those factors,such as statistics information,data distribution,node status,etc.,are un-stable,current technologies lack the ability to ensure the quality of optimization,which maybe leads to serious degeneration of query performance,then,the robustness problem will occur.Current strategies cannot improve the performance meanwhile guaranteeing the robustness for multiway join query in distributed environment.The major problems are reflected in:(1)Statistics information collection does not comprehensively consider the collection efficiency,accuracy loss and impact on system performance.(2)The executing plans process lots of un-necessary data,which will mitigates the query performance and robustness.(3)The schedule strategies cannot distinguish the effective allocation tasks and lack the consideration of robustness.(4)When scheduling the localization of executing tasks,the localization results cannot be reused,and lack the consideration of system performance when proceeding the data migration.To solve these problems,we conduct in-depth study on query optimization in distributed environment,and propose a series of strategies to improve the performance and robustness of multi-way join query.This work was supported by the fund of the Key Project of Natural Science Foundation of China.The main topics and contributions of this thesis can be summarized as follows:(1)Statistics Information Collecting.An adaptive collecting strategy is proposed,which is called ASC.The strategy integrates rich system information,proposes a variety of collection task trigger thresholds,and starts the right task at the right time.For making full use of system resources to improve collecting efficiency meanwhile keeping statistics correctness,ASC allocates appropriate executing model for different tasks based on their work loads.Besides,ASC can adaptively decide when to execute collecting tasks according to node states.Compared with other strategies,the experiment results show that ASC can generally improve collecting efficiency and statistics correctness,meanwhile mitigating negative effect to system performance.(2)Plan Generating.A general strategy for data pruning is proposed,called DPHR,which can be implemented at centralized and distributed database system.First,DPHR constructs multiple independent pruning units based on logical join graph.Then,against to each element in some pruning unit,DPHR can eat the un-existed data ranges by a novel statistic information,called HR(hollow range),generating multiple sub-existed data ranges,which are used to be iteratively aligned until the pruning unit achieving its stable state.Finally,DPHR applies the pruned elements to the appropriate places.For better evaluating DPHR,besides traditional ones,a new BenchMark is built,called DHR.The experiment results show that the HR is correct,the data pruning based on DPHR is high efficiency,the query performance and robustness based on DPHR can be highly improved,and DPHR is also good performance at scalability.(3)Task Allocating.An efficient and robust plan scheduling strategy is proposed,called AlCo,which is based on genetic algorithm.AlCo introduces a novel algorithm to allocate tasks,called MGA,based on the genetic algorithm in PostgreSQL.For improving the robustness of task allocation,AlCo incorporates an upper bound computed with current heuristic strategies to MGA.Based on cost,AlCo can adapt to different scenes and objectively give the optimal allocations.The experiment results show that comparing with other strategies,AlCo can generally improve the performance and has good performance at robustness and scalability.(4)Plan Executing.A task-central strategy of generating local executing place is proposed,which is called TD-lep.TD-lep introduces a novel structure,called task directory(TD),which is used to record the localization state of tasks,including the progress of localization,executing places,migration plan,priority of tasks and other key information.Before allocating tasks,TDlep will first check whether this task is contained at TD.If locating it at TD and finishing its localization,then TD-lep will directly return the allocated executing places for this task.Otherwise,this task will be injected into TD,generating its migrating plan and priority.TD-lep can smoothly migrate data according to task priority and task load to mitigate negative effect to system performance.Comparing with other strategies,the experiment results show that TDlep can allocate executing places for tasks with high efficiency and keep low negative effect to system performance during data migration.
Keywords/Search Tags:Robustness, Distributed query optimization, Self-adaption, Data pruning, Genetic algorithm, Load perception
PDF Full Text Request
Related items