Font Size: a A A

Research On Data Consistency Maintenance Based On Content-Related Conditional Functional Dependencies

Posted on:2017-11-26Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y F DuFull Text:PDF
GTID:1318330542489658Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Data consistency is a central issue of data management.Data consistency refers to the validity and correctness of data representing entities.However,inconsistencies in real life are inevitable and widely appear in varied activities such as finance,medical care,statistics and so on.Inconsistencies bring ambiguity in semantics,which result in damage of economic and property.What's worse,they may harm one's health and life.Hence,the problem of solving inconsistencies becomes a valueable topic of data quality management.Constraints are a technique for data consistency such as functional dependencies and conditional functional dependencies which are proved effective and widely used by applications.Specially,the real-life data are content-related and interactive which contribute to maintainence in consistency.Based on content-related conditional functional dependencies we discuss the problems such as rules discovery,data cleaning in centralized database,data cleaning in distributed database and data cleaning with mixed rules.The detail of our work is shown as follow:(1)Rules discovery.We present the definition of content-related condition functional dependencies(CCFD)by combining the related conditional functional dependencies,which impalements consistency detection by putting content-related data together.To find content-related condition functional dependencies,we present 2-level lattice which clusters the conditional functional dependencies with the same conditional attributes and variable attributes and combines these constraints.Moreover,we prove that the problem of minimal content-related conditional functional dependencies rules discovery is NP-complete so that we adopt a heuristic method.We also present dominating values to facilitate the process of rules discovery.(2)Data cleaning in centralized database.We propose a cleaning method with minimum repairing cost by using related data in centralized database.Content-related data are not independent and influence the others in data repairing.We present a repairing-cost model to measure the modification in content-related data.Our method iteratively detects and repairs the inconsistencies till all the inconsistencies are solved.We also check the repairing to avoid the deadlock by different constraints.(3)Data cleaning in distributed database.In distributed database,data are allocated at different sites and the communication is a concerned issue.We propose a cleaning method to solve the inconsistencies in distributed database.Our method iteratively implements the detection and repairing till there is no inconsistence existing.We also present a repairing-cost model to measure the modification of distributed data at sites.Then we prove that the problem of minimum-cost repairing is NP-complete so that we adopt a heuristic method for repairing.Furthermore,to improve the accuracy and efficiency of data cleaning,we present two new structures as distinct value and repairing sequence,which contribute to decreasing the communication cost.(4)Data cleaning with mixed rules.Different type of constraints are interactive and they can be mixed for data cleaning.We propose a cleaning method by using mixed rules with content-related conditional functional dependencies and currency constraints.Entity data may be abstracted from different sources with different repairing weight.So we present a new repairing weight by considering content-related data.We also prove that the problem of minimum-cost repairing with mixed rules is ?2p-complete(NPNP).To determine the repairing sequence with mixed rules,we also present repairing sequence graph.
Keywords/Search Tags:data quality, data consistency, content-related data, integrity constraints, data cleaning
PDF Full Text Request
Related items