Font Size: a A A

Research And Improvement Of Materialized View Selection Algorithm And Maintenance Algorithm In Data Warehouse

Posted on:2017-02-19Degree:MasterType:Thesis
Country:ChinaCandidate:G W JiangFull Text:PDF
GTID:2308330485991142Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Data warehouse is a data collection, which is subject-oriented, integrated, reflecting historical changes and stable. It stores data from some heterogeneous databases, such as object-oriented databases, relational databases and desktop databases. As a semantically consistent data storage, data warehouse provides data support for online analytic processing(OLAP) and data mining(DM).Data Warehouse needs to support complex, impromptu and time-consuming queries. In order to improve the efficiency of the data warehouse’ query responses, materialized views technology physically stores some common query results in advance. Which views should be chosen to be materialized? In real-world applications, we often need to consider store price, query cost and the cost of view maintenance. In the paper, the cube is used to organize views, and query and maintenance cost model is proposed. Under the condition of a given storage space, the views with the minimum query cost and maintenance cost are selected. The algorithm gets firstly the candidates via pre-treatment, and then calculates the cost of every candidate view according to the view cost model. The algorithm based on genetic algorithm is improved using a mixed strategy of selection operator and an adaptive crossover probability. The improved algorithm, compared to classical genetic algorithm, reduces the cost of searching view and improves the efficiency of data warehouse queries.Although materialized views effectively improve the speed of the system response to user queries, the cost of maintaining materialized views generates. Because materialized view actually store specific content of queries, it is consistent with the underlying data. When the underlying data sources change with insert, update, and delete data, we need to recalculate or using incremental maintenance method to update the relevant materialized view, in order to ensure consistency of data. Materialized views are stored on multiple remote data sources, when the source data changes and how to ensure that the contents of materialized view is consistent with the data sources, which has become a critical and difficult problem in the field of data warehousing research. Based on the analysis of existing maintenance algorithm commonly used on materialized views, I focused on the algorithm based on grouping by update frequency, and improve it. In each group, order ascending the underlying table by the size of the increments, and update materialized views with this order. The experiments prove that the algorithm improves the maintenance efficiency of materialized view.
Keywords/Search Tags:data warehouse, OLAP, materialized views, genetic algorithm, update frequency, maintenance
PDF Full Text Request
Related items