Font Size: a A A

An Approach Of Improving The Data Quality Of Data Warehouse And Its Application

Posted on:2010-10-01Degree:MasterType:Thesis
Country:ChinaCandidate:T LongFull Text:PDF
GTID:2178330338982200Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Recently, with the application and development of database technology, people are trying to reprocess the data in database and build an integrated, analytical environment in order to support the decision analysis, the concept of Data Warehouse has been proposed. The emphasis and demand of Data Warehouse is extracting the data from database precisely, safely and reliably, after transforming the data to regular information, it helps the manager to analysis and make a decision effectively, and provides a proper data environment for Data Mining and Knowledge Discovering.While building the Data Warehouse, the task which is the most heavily and goes a lot of troubles is that extracting, transforming and loading the data from business database to Data Warehouse, what is called the process of ETL. Because of the data of warehouse come from multiple business systems, data source's quality variability and its complex business logic, the process of building a data warehouse will have the inevitable problems of data quality. Some development of applications even failed for the bad data quality, which makes the users doubt the information system's correctness and usability.Therefore, Data quality is the main factor that determines whether an information system may play its role. Quality maintenance and improvement is along with the life cycle of data.Data quality is a systematic engineering, which is all iterative process combing quality evaluation and data cleansing. In this paper, In this paper, based on the design and development of HUNAN mobile business analysis system, A ETL scheduling and check approach to improve the verification of the data quality data warehouse approach is proposed, Within this framework, We focus on some key technologies involved.First of all, the automation of data management which is realized by ETL schedule principle make the data into a data stream, the database where the data exists as starting point of ETL, the data warehouse of saving high-quality data as the end of ETL.Secondly, for the data which the ETL data flow have been formed to generate instances, for the data which complete ETL processes construct to handle historical information, thus assurances that the ETL flow can be monitored.Thirdly, by a way of building the processing logics of ETL processes and automatic check program, so that the check procedure can correct erroneous data, and finally get the high-quality data.Finally, the proposed method has applied to the process of HUNAN mobile business analysis system. The implementation of the project proved that the illegal flow of data can be deal with in advance, ensure the accuracy of the data in the data warehouse.
Keywords/Search Tags:Data Warehouse, Data Quality, ETL, Schedule, Check
PDF Full Text Request
Related items