Font Size: a A A

Optimization And Implementation For DWMS Column-Store Query Execution Engine

Posted on:2013-01-12Degree:MasterType:Thesis
Country:ChinaCandidate:Q ZhangFull Text:PDF
GTID:2218330371955887Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
The efficiency of query execution is the main concern when developing a data warehouse man-agement system. Recent studies show that column-store system can adapt to the query oriented cha-racteristics of data warehouse better than row-store, because column-store system will only read query related data, avoiding unnecessary I/O cost. What's more, column-store system has higher compress rate than row-store system.We researched the techniques of data warehouse, column-store, and the core modules of query execution engine, and then optimized the structures and strategies of current query execution engine based on the feature of column-store. The structures include the structure of transfer-block and opera-tor nodes. And the strategies include the reusability estimation model base on the relative position of the given operator node in the physical execution tree and reuse buffer scheduling algorithm based on the results of the reusability estimation model.Firstly, we analyze the characteristic of data warehouse and column-store, especially the ways of storage and query execution of MonetDB, C-Store and InfoBright as well as the disadvantages of them.Secondly, we introduce the core elements of column-store query execution engine in detail, in-cluding the physical execution tree, the operator nodes which form a physical tree, the way how data are organized in memory, iterator and its net by which the data can been processed and transferred.Then, we optimize the current query execution engine from the structure aspect and the strategy aspect. Based on the optimized design, we implement a more advanced query execution engine. From the aspect of structure, we design a new transfer-block which can store rowid as well as value of vari-ous types. In addition, the new-designed transfer-block can accomplish low cost tuples reconstruction based the relationship among positions. Then considering that the data of data warehouse is massive and sorted, we optimize several operator nodes, including using factory mode to develop operator nodes to reduce the cost of repeated judgments, pre-sentencing predicate selection node to reduce un-necessary judgments, redesigning hash join node to be a fact table aware one. improving traditional sort join node to make it take the advantage of the feature of join between fact table and dimension table, designing the method for extracting both fixed length and variable-length columns as well as the print node that can be applied to 19 different scenarios. Throughout the whole development process, we keep putting the loops into the functions rather than putting the functions into the loops to reduce the cost of function call. And from the aspect of strategy, firstly, we propose a reusability estimation model base on the relative position of the given operator node in the physical execution tree as well as the estimated volume of the intermediates it produces during execution. Then, we provide the reuse buffer scheduling algorithm based on the results of the reusability estimation model and optimize it as well.At last, we summarize the present situation of the optimization of data warehouse management system column-store query execution engine and preview the research future.
Keywords/Search Tags:data warehouse, column-store, query execution, optimization
PDF Full Text Request
Related items