Font Size: a A A

The Research Of Incremental Data Warehouse Maintenance

Posted on:2005-02-11Degree:MasterType:Thesis
Country:ChinaCandidate:Y S GuoFull Text:PDF
GTID:2168360152467003Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Data warehouses contain large amounts of information, often colleted from a verity of independent sources. As changes are collected from data sources, all data at the data warehouse that depend upon this must be updated to reflect these changes. Since there are so many amounts of data in the warehouse, it is not feasible to reload the whole data warehouse each time. So it is necessary to find a method that can efficiently refresh the data warehouse. That is to say, the data warehouse is only updated with the changed information captured from data sources. Decision support functions, such as on-line analytical processing (OLAP) and data mining, involve complex aggregate queries over large volumes of data. Warehouse applications therefore build a certain number of materialized views to increase the system performance. Data warehouse refreshment is often viewed as a problem of maintaining materialized views. In this paper, we propose a framework for refreshing data warehouses incrementally. It is shown that the data warehouse refreshment process is a complex process comprising several tasks, e.g., monitoring, extracting, transforming and integrating operational data, loading and refreshing the data warehouse. When the data warehouse is maintained in usual way, change information will be directly applied to the tables and materialized views in the warehouse. In this paper, a Delta table method is proposed. Changes will be first extracted to the Delta tables, and then these Delta tables are utilized to maintain the warehouse. Through this method, complex maintenance processes can be divided. So, controllability and correctness is further assured. Because the updates of data sources and the maintenance of the warehouse are decoupled, it may incur data anomalies. This paper introduces a Compensating Algorithm that can eliminate anomalies. The algorithm is based on eager compensating algorithm (ECA). But it is greatly improved to apply to our framework. The data warehouse is typically unavailable to readers while being maintained. In order to decrease the time required for maintenance, we split the maintenance work into propagate and refresh processes. During propagating process, the warehouse is still available to readers. So we should do as much work as possible to decrease the time required for refreshing process.
Keywords/Search Tags:data warehouse, incremental refreshment, materialized view, aggregate function, data anomaly
PDF Full Text Request
Related items