Font Size: a A A

Research And Implementation Of Data Quality Rules Mining And Detection System

Posted on:2013-04-17Degree:MasterType:Thesis
Country:ChinaCandidate:Y Y BoFull Text:PDF
GTID:2248330362965430Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Trend of the era of information technology in the world, all walks of life are filledwith a lot of information, and data is the carrier of information, leading to more andmore data appear. People always complain that "data-rich, lack of information”, thereare two major reasons: one is the lack of effective data analysis techniques; one is lowdata quality, resulting effective use of available data. That how to improve the qualityof data to accurately reflect real-world situation, or efficiently support the operationand the decision of the enterprise is crucial, moreover good data quality is theprerequisite and basis of the deeper data mining. Therefore, the data quality problemincreasingly becomes a hot issue in the database, especially in those industries andapplications which related to the large amount of data, the problem was moreprominent.For data quality issues, there are some detection tools, but most of them aredesigned for one aspect, for example, the detection tool for duplicate records, whichjust only can test the duplicate records of data set, but not for the expansion of otherfunctions, that is to say,it cannot detect or mining other aspects of quality problems.These factors make mining and testing problems of data quality rules becomechallenging issues with research and development prospects in the field of datamanagement.In this paper, we designed and implemented a generic, scalable data quality rulesmining and detection system which has high automation degree of intelligence, thesystem can not only detect the duplicate records of data set, but also can tap the dataset’ s potential data quality rules and according to these for testing quality of data sets.Conditional functional dependency (CFD) is one of data quality rules, a variety of datasets based on excavated CFDs for error and conflict detection.
Keywords/Search Tags:Data Quality Miner System, duplicate record, Conditional FunctionalDependency
PDF Full Text Request
Related items