Font Size: a A A

Research On Query Task Scheduling Method Of Distributed Database

Posted on:2018-11-02Degree:MasterType:Thesis
Country:ChinaCandidate:P WuFull Text:PDF
GTID:2428330596454763Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The rapid development of Internet technology has gradually changed the concept of public consumption.Today,a variety of transaction patterns make the database technology face new challenges.The traditional distributed technology has been unable to meet the explosive growth of massive data processing needs.On the basis of the large-scale parallel processing(MPP)technology,Greenplum database achieves distributed storage and large-scale parallel computing.At present,Greenplum database has become a new technology for massive data analysis and processing.For the distributed database,how to effectively schedule the query task to achieve query optimization has always been the focus of the study.This thesis optimizes Greenplum database from the perspective of query task scheduling.Greenplum database uses a centralized scheduling strategy.Among them,the master works as a scheduler,and uses a static scheduling method to deal with the entire query parallel processing.This scheduling process implements the mapping of the query task to the segment nodes.With the development of hardware technology,multi-core technology provides a new direction for parallel computing.When Greenplum database is used to solve the OLAP scenario,the cluster environment mostly uses the multi-core CPU system to realize the parallelization.In this thesis,the centralized scheduling method of Greenplum database distributed database is studied,and the improvement strategies of the two stages in the scheduling scheme are proposed to improve the query efficiency and system performance.In order to optimize the scheduling scheme of Greenplum database,the following research has been done.First,the task allocation process is optimized.This thesis designs a better query plan generation scheme to implement the assignment of query tasks to segment nodes.Considering the factors that affect the execution of the plan,the optimization scheme is designed from decreasing the search space,creating a better cost model and using an effective the search strategy respectively.This thesis first designs a cost estimation model.The model takes the cost of data manipulation and the cost of data transmission into account.Based on the cost model,this thesis uses the parallel maximum and minimum ant colony algorithm to optimize the query join order to efficiently generate a better query plan.Then,the task scheduling phase is optimized.After receiving the query plan from the master,the segment then schedules the tasks by mapping them into the cores of the CPU.The task scheduling problem on the segment is equivalent to the task scheduling in the heterogeneous multi-core system.In this thesis,a multi-heuristic List scheduling algorithm MHECP is proposed,which combines the traditional List scheduling algorithms HEFT and CPOP to solve the scheduling problem of heterogeneous systems.The MHECP algorithm improves the process of task priority calculation and the process of redundancy task elimination.This improved algorithm achieves more efficient task scheduling without improving the complexity of the algorithm.Based on the above research,this thesis implements the task scheduling process of Greenplum database from the task assignment and task scheduling stage respectively.Combined with Greenplum's preprocessing and pre-optimization process,the optimized scheduling strategy can effectively improve query efficiency and system performance.Finally,a series of comparative experiments prove that Greenplum database has a great advantages on dealing with Big Data.By comparing the optimized scheduling strategy with the pre-optimization scheduling strategy,it is proved that the task scheduling optimization proposed in this thesis improves the query efficiency of Greenplum database to a certain extent and has its practicality.
Keywords/Search Tags:MPP, Greenplum, query plan, task assignment, task scheduling
PDF Full Text Request
Related items