Font Size: a A A

Data Abnormity Repairing And Its Application Research

Posted on:2020-05-28Degree:MasterType:Thesis
Country:ChinaCandidate:Q Q ZhangFull Text:PDF
GTID:2428330614965632Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Data quality directly influences the performance of data modeling and analysis,such as the generalization ability and analytical accuracy.Thus,detection and correction of data abnormity have the practical significance and value for data quality engineering and mining.Integrity constraints,the primary method of dirty data repair in relational databases,are often found to be inaccurate as well.Existing data repair algorithms consider the inaccurate constraints that are oversimplified and refine the constraints by inserting more predicates.But the fact is inaccurate constraints may not only be oversimplified so that correct data are erroneously identified as violations,but also could be too precise that fail to identify true violations.In order to make the repair more accurate,thesis considers both the addition and deletion of predicates,and proposes an algorithm for unified repair of data and constraints(DCR).The algorithm returns a data repair with minimum data repair cost that satisfies the constraint set variants after one round repair.Results on real dataset demonstrate that the algorithm has more accurate data repairs compared to the existing methods.Simple data sets do not contain integrity constraints and existing repair algorithms based on constraints cannot be applied to them.Considering pairwise constraints and density information of the data sets,a data repair semi-supervised learning algorithm based on data density is proposed.The algorithm follows the principle of minimum change on data.Density information and pairwise constraints are utilized to form temporary clusters,then the temporary clusters are divided or merged to form the final clustering result by pairwise constraints.The clustering is done along with the process of inaccurate data repair.Experiments show that the proposed algorithm can improve accuracy of data repair and clustering precision effectively.
Keywords/Search Tags:data repairing, data cleaning, integrity constraint, pairwise constraint, density-based clustering
PDF Full Text Request
Related items