Font Size: a A A

Research On The Optimization Of The Schedule Engine Of The Oriented-column Database Based On The Multi-core Processors

Posted on:2017-02-23Degree:MasterType:Thesis
Country:ChinaCandidate:W T ChenFull Text:PDF
GTID:2348330536953160Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet and the rise of the big data,data query and analysis is becoming more and more important.More attention should be paid to it in order to improve query efficiency.This paper is based on an oriented-column database called GSQL with high performance.The system includes a compiler that explains a SQL query into an execution plan,a scheduler that dispatches the primitives to the multi-processors and a storage engine on the back end.The scheduler based on the multi-processors is designed to dispatch the primitives according to the data stream.In order to improve the query efficiency and speed up the query analysis,this paper studies the scheduler and focus on the optimization of it.The main study and contributions of this paper include:(1)The query execution procedures are constrained by the different executive speed of all the primitives,so task partition is proposed to solve this problem.Task partition can divide the primitives into little tasks that construct pipeline structure.The experiment results show that task partition can improve the query speed at least 22%.(2)A new primitives scheduling execution mechanism is proposed.Since I/O primitives and CPU primitives are executed by the different units,in order to improve the utilization of executive units,the schedule engine creates two threads to assign the I/O primitives and CPU primitives independently to executive units.(3)A topological sorting scheduling strategy based on the primitives' longest path length and the depended degrees is proposed.The primitive path length indicates that a primitive is how far away from the output primitives.The depended degrees mean that a primitive has how many succeeding primitives.We can decide their priorities according to their longest path lengths and depended degrees.The experimental results show that the scheduling strategy of topological sorting which is based on the primitives' longest path length and the depended degrees of the primitives can improve the query speed 28% at most.
Keywords/Search Tags:Oriented-column database, the primitives scheduling execution mechanism, task partition, longest path length, depended degree
PDF Full Text Request
Related items