Font Size: a A A

Reserch On Data Repairing Techniques Based On Editing Rules And Master Data

Posted on:2018-07-30Degree:MasterType:Thesis
Country:ChinaCandidate:H YangFull Text:PDF
GTID:2348330536452516Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Data quality management has been recognised as one of the first tasks of data management system.For the data consistency and accuracy in the field of data quality management,a variety of integrity constraints can only be good to detect errors in the data.Indeed,data repairing based on these constraints may not ensure that the fixes are absolutely correct.This paper respectively considers the editing rules mining and inconsistent data repairing.It provide an algorithm to mine editing rules from source relations and master relations and another algorithm to repair inconsistent data automatic based on editing rules.At the same time,on the basis of the proposed method,put MyEclipse as the experimental development platform,with java implement the above algorithm and applied to real-life data.The experimental results show that the methods applied to relational database can effectively dig out editing rules and on the basis of which can correctly repair inconsistent data.The concrete research content is as follows.The technology of mining editing rules based on conditional functional dependencies mined is researched.Analysis and comparison of existing CFDMiner,FastCFD and CTANE,the procedure of FastCFD computing difference sets is improved,and implemented CFDMiner and FastCFD,by extent conditional functional dependencies,redefine editing rules,in view of master schema different from sample data schema,provide the algorithm of the discovery of one-to-one correspondences from sources to master attributes.On this basis,the algorithm of mining editing rules is given,the performance of the algorithm is analyzed,and the implementation process of the algorithm is illustrated with an example.On the basis of CFDs repair technology,the data repair technology based on editing rules is studied.At first,the related technologies based on CFDs repair are analyzed,and the concrete realization and theoretical proof are given.On this basis,the analysis of the consistency problem and coverage problem of editing rules,and prove that they are NP-complete problem;in order to achieve certain repair,based on graph theory,put forward certain region structure mining algorithm from editing rules and master data,so as to realize data repair algorithm based on the editing rules and master.Based on CFDs repair technology for the violation item,using the target value of the equivalence class that the violation belong to,ultimately achieve the state of all tuples,and the algorithm cannot guarantee that repair is absolutely correct.But the repair technology based on editing rules,which make full use of clean master data,before repairing test whether the attribute value at left of the rule is correct,and prevent mistakenly applied to the rules,so this article proposed repair method based on editing rules can guarantee absolutely correct.Finally,an example shows that the algorithm proposed in this paper can achieve the correct results.By the experiment,the time and number of rules are evaluated,and the performance of the algorithm of editing rules mining based on different CFDs mining is analyzed and compared.Experimental results show that the algorithm based FastCFD is more time consuming than the algorithm based CFDMiner,but it can dig out more rules.In addition,the repair algorithm based on CFDs and the repair algorithm based on editing rules are analyzed and compared.The experimental results show that the running time is similar,but judging from the F-measure,the repairing based on editing rules more effective than the repairing based on CFDs,which proves the validity of the data repairing based on editing rules.
Keywords/Search Tags:Editing Rules, Master Data, Conditional Functional Dependencies, Data Repairing, Equivalence Class
PDF Full Text Request
Related items