Font Size: a A A

Research On The Cleaning Method Of Correlated-data Based On Currency Constraints

Posted on:2021-03-19Degree:MasterType:Thesis
Country:ChinaCandidate:Q Y ZhangFull Text:PDF
GTID:2518306047482004Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Data has become one of the core strategic resources.Whether it is social progress or business success,it increasingly depends on the analysis of data.Data consistency and timeliness,as important components of data quality management,have always been the focus of research in related fields.Numerical data exists widely in fields such as medicine and finance.Improving the data quality of numerical data has broad enough application prospects,whether it is used as the basis for enterprise data storage,high-quality data analysis,or for developing related applications.Improving the consistency and timeliness of numerical data has been a research focus and focus in the field.Data timeliness errors and consistency errors do not exist independently.The two problems may be mixed to form more difficult data errors.There have been some research results so far,but these results are still insufficient for the cleaning of numerical data.Rule constraint is an important technology in the field of data quality management.There is an association between data in the real world.As a hot issue in the field,the cleaning of associated numerical data based on aging rules has not only important theoretical significance,but also extensive application value.The main work of this article is as follows:(1)A numerical functional dependency based on related content is proposed.After analyzing the advantages and limitations of numerical function dependence,this paper proposes a numerical function dependence based on the content of the correlation by analyzing the existing relationships between the data,and explains its definition and rule discovery method.Using inconsistent data detected by numerical functions based on associated content,a data consistency cleaning method is proposed;(2)A data cleaning method based on the mixed rules of time-constrained constraints and numerical functions based on related content is proposed.Prove that the minimum cost problem based on hybrid rules proposed in this paper is a complete problem;a framework for cleaning data based on hybrid rules is proposed,including error data detection and error data repair;(3)On multiple real data sets and artificial data sets,through comparison experiments with traditional rule-constrained cleaning methods,the effectiveness and performance of the proposed cleaning framework are verified.The experimental results prove that the method proposed in this paper has better cleaning effect than the traditional rule constraints.
Keywords/Search Tags:Data cleaning, rule constraint, data currency, function dependencies
PDF Full Text Request
Related items