Font Size: a A A

Research On Query Optimization In Data Warehouse

Posted on:2007-02-20Degree:MasterType:Thesis
Country:ChinaCandidate:Y G WangFull Text:PDF
GTID:2178360212965609Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
It should be attached importance to the storage space saving of a data warehouse where the integration information used for query and analysis is stored because it includes many historical data. In the meantime, the unprepared and complicated query needs to be supported in a data warehouse, a lot of records should be accessed and the complicated connection and aggregate operation should be processed, so it takes a long time. The technical support is necessary in order to increase the efficiency of query, and the way to increase the query speed by costing storage space should be considered. The optimization of the materialized view and the dimension table schema and the clustering index is researched to increase the key index to appraise query technology such as query speed and storage space.The materialized view is an important means of increasing the execution efficiency of a data warehouse, but the storage of the materialized view costs space. The cost estimation model which its measurement standard is the time cost of query composed of the materialized view which has to be scanned during the query or the space size of fact tables and the storage cost of the materialized view is built and the optimization algorithms of the materialized view based genetic algorithms is designed, in order to minimize the sum of the storage cost of the materialized view and the time cost of the query.It brings on much connection cost to use snowflake schema in a data warehouse. The cost estimation model which its measurement standard is the time cost of the query and the storage cost of dimension tables is built and the optimization algorithms of dimension tables schema based genetic algorithms is designed, in order to adjust the dimension tables schema nondestructively and automaticly and to minimize the sum of the storage cost of dimension tables and the time cost of the query, that is to say, to exchange the minimum space cost for the maximum query speed.The speed of computing cluster functions can be increased greatly by using the clustered index reasonably during OLAP query, in order to do it, the sequence of clustered index keys can be determined according to the data accessing quantity when the cluster index is building. According to the executive situation of the query during the running procedure of a data warehouse, the reasonable row sequence of cluster index keys is computed and the cluster index is rebuilt according to this sequence in order to decrease the I/O frequency of the disk and to increase the query speed of the system.
Keywords/Search Tags:data warehouse, genetic algorithms, materialized view, dimension table, Clustered Index
PDF Full Text Request
Related items