Font Size: a A A

Research OnData Cleaning In Web Service-based Information Integration System

Posted on:2008-09-12Degree:MasterType:Thesis
Country:ChinaCandidate:N WangFull Text:PDF
GTID:2178360212474578Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development of computer networks and database technology, people have more and more ways to get data, The volume of data increases rapid; the amount of data is increasing dramatically. The value of these data lies in the quality of the data rather than the quantity, and the decision based on bad data is unbelievable. And the quality of data manipulation is directly related to data quality and availability. But, with such a huge and chaotic data, manual processing of data is very difficult; data quality has become a "bottleneck" in data application. Correcting data errors is an important factor to avoid wrong decisions, reduce the risk of decision-making. Data cleaning is used to complete this arduous task.This paper introduces the concept of data quality and some data cleaning tools for different quality of data, makes classification according to the type of data quality. It illustrates the structure of data cleaning,the flow and the function of each module designed to achieve the data quality in Web Service-based Information Integration System.The framework of data cleaning proposed in this paper achieves the following major functions : (1) The design and implementation of data preparation module to change the more complicated problem of multiple data sources into the relatively simple problem of a single data source; (2) The design and implementation of modules including data selection module, data standardization module, duplicate detection module and data mapping module used to complete the data cleaning; (3) The design and implement of the system maintenance and expansion interface easily for the update and maintenance of the data cleaning system; (4) The provision of data dictionaries and rule libraries which makes the flexibility of data Cleaning System greatly enhanced.
Keywords/Search Tags:Information Integration, Web Service, Data Quality, Data Cleaning, Rule Library
PDF Full Text Request
Related items