Font Size: a A A

Incremental Maintenance Of Data Warehouse Architecture Research

Posted on:2008-11-24Degree:MasterType:Thesis
Country:ChinaCandidate:Q ZhaoFull Text:PDF
GTID:2208360212987289Subject:Industrial Economics
Abstract/Summary:PDF Full Text Request
While database meets the need for storing data, especially for storing routine business data, data warehouse meets the need for organizing and processing these business data hence reaping information for decision making. In the era when rapid changes make important of rapid responses, information with high quality for decision making calls for data warehouse with high efficiency, which further drives the need to enhance the maintenance process of data warehouse.Data warehouse contains large amount of information, often colleted from a variety of independent sources. As changes are collected from data sources, all data in the data warehouse that depend upon them must be updated to reflect the changes. Since there are so many amounts of data in the warehouse, it is unfeasible to reload the whole data warehouse each time. So it is necessary to find a method that can efficiently refresh the data warehouse. That is to say, the data warehouse is only updated with the changed data captured from data sources.Decision support functions, such as online analytical processing (OLAP) and data mining, involve complex aggregate queries over large volumes of data. Warehouse applications therefore build a certain number of materialized views, which store results for some queries that will be used frequently or will cost much to calculate, to increase the system performance. As materialized views are based on data source tables, they also need to be maintained. And as part of the data warehouse maintenance, materialized view maintenance is also incremental.Although materialized view takes the irreplaceable position in data warehouse, it composes an organism together with other objects and interacting modules of data warehouse, so the discussion about data warehouse maintenance can not be restricted to materialized views. In the papers, we propose a framework for refreshing data warehouses incrementally. It is shown that the data warehouse maintenance process is a complex process comprising several tasks, e.g., monitoring, extracting, transforming and integrating operational data, loading and refreshing the data warehouse and preserving historical data. The framework decomposes the maintenance task into modules and reflects the interaction between them.A prerequisite for incremental maintenance is to monitor and capture the changed data of data sources. According to different types of data sources, there are differentmethods for achieving this. In the paper, some of these methods will be analyzed, and the mechanism that is applied in Oracle 9i will also be introduced.When the data warehouse is maintained in usual way, changed information will be directly applied to the tables and materialized views in the warehouse. In the papers, a Delta-table method is proposed. Changes will be first extracted to the Delta tables, which also record the updating modes(insert, delete or update) of the data, and then these Delta tables are utilized to maintain the warehouse. Through this method, complex maintenance processes can be divided. Hence, controllability and correctness is further assured.The data warehouse is typically unavailable to readers while being maintained. In order to decrease the time required for maintenance hence enhance the efficiency of data warehouse, we split the maintenance work into propagating and refreshing processes. During propagating process, we prepare data at the back end that will be later applied to the objects in data warehouse, and the warehouse is still available to users. So we should do as much work as possible at this stage to decrease the time required for refreshing process, the next stage. During refreshing process, as the data that has the same structure with corresponding objects in data warehouse has already been prepared well, the process will achieve higher performance and has much less effects on data warehouse.Today, as more and more unpredictability and severe competition drive business unites to quickly identify changes arising both internally and externally and make appropriate responses, real-time business intelligence gains more favor. And a dependable real-time BI system calls for a data warehouse with the same dependability that will provide necessary data for analyzing and decision making, and this in turn calls for the same level of real time of the data warehouse. It is necessary to mention that real-time data warehouse dose not fit all circumstances, and also it may not be the best choice to get the shortest time lag between the taking place of data source changes and their reflection in the data warehouse. It mainly depends on two factors: the availability of data and the business operation itself. while the former is a restriction which can be overcame technically, the latter is what really needs to be taken into account, e.g. whether it will make sense for a specific business problem to implement real-time data warehouse supporting real-time BI and making real-time responses, and whether the gain from this can make up the cost. The paper introduces near-realtime data warehouse maintenance framework based on previous efforts. It considers different strategies that can be adopted by each module in the formerframework, and the combination of them contributes to several levels of real time, which will meet different needs of data warehouses on real time.
Keywords/Search Tags:data warehouse, incremental maintenance, framework
PDF Full Text Request
Related items