Font Size: a A A

Study On Inconsistency Detection And Repair Of Multi-table Data

Posted on:2021-02-08Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZhangFull Text:PDF
GTID:2428330611998517Subject:Engineering
Abstract/Summary:PDF Full Text Request
Throughout the research process of data quality inconsistency,conditional function dependency,conditional inclusion dependency,micro-function dependency and their extensions have been proposed one after another.These methods can not solve the overall or partial inconsistency between different attributes of multiple tables in practical applications.In this thesis,a method of inconsistency detection based on master data and extended micro-function dependency is proposed,as well as a method of inconsistency data repair based on confidence and entropy is proposed.In order to solve the problem,which the overall or partial inconsistency between different attributes of multiple tables,this thesis proposes the extended micro-function dependency,which is the extension of micro-function dependency.This thesis introduces master data to distinguish the wrong data,so as to solve the problem of inconsistency propagation between different attributes of multiple tables.In this thesis,a multi-table data inconsistency detection method based on master data and extended micro-function dependency is proposed,which is based on master data repair.In this method,the attributes and the master data are detected by conditional inclusion dependency.Only records that meet the dependency can be detected by micro-function dependency.Data that does not meet any of the above detection rules is inconsistent.In order to solve the problem that extended micro-functions dependency on time cost in exchange for detection accuracy,this thesis proposes an incremental detection method.This method identifies the data affected by the addition,deletion and modification of data or detection rules,and detects the inconsistency of these data,which can effectively improve the detection efficiency.This thesis not only proposes the detection methods of inconsistent data,but also studies the automatic mining and integrity detection methods of extended micro-function dependency rules.In order to ensure the consistency,correctness and integrity of the extended micro-function dependency rules,this thesis proposes the e CANE algorithm for automatic mining of dependency rules and the FHG method for rule integrity detection.In order to solve the problem of repairing inconsistent data in multiple tables,this thesis proposes an automatic method based on confidence and entropy.This method mainly repairs the data,whose confidence is greater than or equal to the confidence threshold,or whose entropy is less than or equal to the entropy threshold.The repair value can be determined by the master data and extended micro-functiondependency rules.The remaining inconsistency data is manually repaired and then detected again.According to the above methods,a multi-table data inconsistency detection and repair system is designed and implemented.In this thesis,the system architecture,business process,functional modules,key technologies and implementation effects are described in detail.The system adds a manual review process to the automatic detection and repair process to ensure the accuracy of data inconsistency detection and repair.
Keywords/Search Tags:extended micro-function dependency, inconsistency, incremental detection, data repair
PDF Full Text Request
Related items