Font Size: a A A

The Research And Application Of Data Cleaning Technique

Posted on:2006-09-19Degree:MasterType:Thesis
Country:ChinaCandidate:W B LiangFull Text:PDF
GTID:2178360155467451Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In order to get the useful information from ever increasing transaction data in the online transaction processing system and resolve the contradiction of "Data Rich, but Information Poor", data warehouse arises .In the building process of data warehouse, the data quality is the most important factor which data warehouse's success rely on as well as decision support and trend analysis. There are unavoidable errors in the data which data warehouse need to retrieve and refresh from various data source . So we must ensure the data quality which the data warehouse builds on. This paper studies the theory and definitions of data quality at first, then analyze the importance of the data cleaning and main cleaning process. Also this article compares current theory of cleaning framework and commercial applications. After the comparing mentioned above , this paper propose a data cleaning framework and part of implementation. This paper put much emphasis on the research and design of the data cleaning framework which can be extensible and customized. The data cleaning and transformation are integrated into this framework and use the XML-based process definition language to describe every cleaning process. This framework uses the metadata administration center to control all the configuration information which used in the cleaning process. To make this platform compatible with most platforms ,the framework uses Java and XML techniques .Also this framework can be integrated into any other applications to clean the dirty data. At last summary is given to describe the overall characters as well as some disadvantages which need to be improved in the future.
Keywords/Search Tags:Data warehouse, Data cleaning, Process Definition Language, Metadata Administration Center
PDF Full Text Request
Related items