Font Size: a A A

Research On Query Optimization In Column-Oriented Data Warehouse

Posted on:2012-04-05Degree:MasterType:Thesis
Country:ChinaCandidate:J LiFull Text:PDF
GTID:2178330332485815Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Data warehouse query is always the hot point in the area of database research. Recent studies show that column-store, a storage system only reads query-related attributes into memory from disk, is more suitable for OLAP, data warehouse and other query-intensive applications. As a read-optimized system with less update, the idea of column-oriented data warehouse can improve the query performance has already held a dominant position.The paper researched the techniques of data warehouse, column-stores and current query optimization. It designed and realized the query module of column-oriented data warehouse, including the parser, the preprocessor, the query optimizer and the plan generator. In the aspect of query optimizer, it is designed by merging the rule-based optimization (RBO) method and the cost-based optimization (CBO) method. And then it proposed the join strategy optimization in column-oriented queries.Firstly, paper analyzed the query features of data warehouse and current column-oriented query techniques. It discussed the storage mode and query method of several column-oriented systems in detail such as PAX, InfoBright, C-Store and MonetDB. After that, it summarized the query differences between column-stores and row-stores.Secondly, paper did some deep research on query module of column-stores. It designed and realized function modules of query compiler. It realized the parser by combing the parser tree structure with two open source tools, Flex and Bison. It designed and realized the preprocessor according to the SQL standard and the query tree structure, including three function modules which are semantic analysis, object characteristic binding and half-plan generation. On the basis of modern join strategies, this paper proposed a new column-oriented query optimization method. The method used RBO method to set rules for column-oriented queries to filter those candidate plans with large costs. Then it designed the CBO algorithm. It changed the execution order by Huffman tree and left-deep tree principle. It summarized the execution strategies of each join node in the column-oriented query plan into two:pipeline strategy and parallel strategy. Based on that, a cost model is then proposed focusing on estimating the cost of the pipeline and parallel strategies. The experimental results showed that with small time and space complexity, the efficiency of the query execution in column-oriented systems is improved.Finally, the paper introduced the principle of plan generator, including logical plan and physical plan. It summarized the query optimization of column-oriented data warehouse and previewed its research future.
Keywords/Search Tags:Data warehouse, column-store, query optimization, join strategy
PDF Full Text Request
Related items