Font Size: a A A

Incomplete Relation Correction Method Based On Template

Posted on:2018-01-23Degree:MasterType:Thesis
Country:ChinaCandidate:H ZhangFull Text:PDF
GTID:2428330596451648Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In the era of rapid development of information technology,computer technology has been widely used in various fields,generating a large am ount of data,such as production equipment data in the industrial field.These data often come from multiple data sources.It is difficult to ensure the consistency and integrity of the data in the process of data collection,processing and storage to the database.It is easy to produce data inconsistency,null value and other issues,which in turn affects the quality of the data.High data quality is not only related to the comprehensiveness and authenticity of the data,but also the basic condition for ensuring the meaningful results of data analysis.Therefore,how to correct inconsistent data and how to deal with null values to improve data quality has important research significance and application value.This paper studies the dataset of industrial production equipment of Swedish SSG company from two aspects: data inconsistency correction and null value correction.Aiming at the problem of data inconsistency,this paper proposes a template-based incomplete relation correction method,which introducing templates as standards to perform data inconsistency corrections for incomplete relationships.For the null value problem,this paper proposes multi-dimensional and multi-model association rules,which makes use of the potential relationship between data to complete the correction of the null value.The main work of this thesis is as follows:1)A search matching method for incomplete relation data and template data to be corrected is proposed.Using the word segmentation algorithm based on mutual information,N-Gram and information entropy,the incomplete relation data is processed by word segmentation,and the extended B-Tree index is constructed by the result set after segmentation.By using the template matching B-Tree index,it is preliminarily determined whether there is a correspondence between the template data and the incomplete relation data.2)A template-based incomplete relation correction algorithm is proposed to correct inconsistent data.The regular language pattern of the template is constructed.The regular language pattern is combined with the template data to generate the corresponding regular language,and the inconsistent data of the incomplete relationship can be corrected by the regular language.3)A null-filling method based on multi-dimensional and multi-model association rules is proposed.The prefix-tree-based lookup method are used to generate frequent prefixes,frequent suffixes,generating frequent itemsets by the apriori algorithm,and then generates three forms of association rules based on frequent itemsets: item to item,prefix to item,and suffix to item,finnally fills in null values using the generated association rules.4)This paper uses the supplier data of SSG to verify 44 the method.The experimental results show that compared with the existing editing method based on edit distance and the method of null filling based on traditional association rules,method in this paper has improved in inconsistent correction rate and null correction rate,reaching 46.01% and 54.87% respectively.
Keywords/Search Tags:SSG, inconsistent data, extensional B-Tree, null value correction, multidimensional and multi-model association rules
PDF Full Text Request
Related items