Font Size: a A A

Study On Data Cleaning Based On XML And Its Application

Posted on:2007-05-18Degree:MasterType:Thesis
Country:ChinaCandidate:Y Z TanFull Text:PDF
GTID:2178360212968403Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology, enterprises have accumulated a lot of data in this process and organizational managers depend on these data more and more when making their decisions. So, data cleaning is vital to improve data quality of information system. Therefore, the study on data cleaning has gained its theoretical and practical value. Data cleaning is studied in this dissertation, and the main contributions are as follows:①The importance and necessity of the present study are presented. The research actuality of data cleaning is analyzed, and the problems in data cleaning research process are indicated. Emphases are data quality problem, the basic principle of data cleaning and XML-related technologies.②Based on structured data cleaning, aiming at XML semi-structured data for the study of data cleaning. The results include: 1) Research is divided into two categories for XML data cleaning, namely schema-level data cleaning and instance-level data cleaning; 2) Schema-level data cleaning includes name conflict, data type conflict and constrain conflict according to XML Schema's characteristics. The dissertation elaborate each conflict cleaning rules, methods and implementation steps; 3) Speaking of XML instance data cleaning, The dissertation made some exploratory studies for XML similar duplication data cleaning and proposed a method based on similar duplication records in structured data cleaning.③Realization of data cleaning. Duplicate detection arithmetic will be applied to customer information data cleaning in accordance with building Chongqing Mobile integrated accounting system business needs. The experimental results show that these methods are feasibility, effectiveness.In short, in order to make the data more accurate, consistent and support the correct decisions, data cleaning can not be ignored.
Keywords/Search Tags:Data Cleaning, XML, Similar Duplicate Data, Data Quality
PDF Full Text Request
Related items