Font Size: a A A

Data Quality Assurance Algorithm For Automatic Pattern Transformation

Posted on:2019-09-06Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZhouFull Text:PDF
GTID:2428330566496860Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the prosperity of the current social economy and the progress of science and technology,all walks of life have accumulated a lot of data.The sources of these data are very rich,including large industrial data,telemetry data,social network data,time data and location data,text data and so on,which can be seen that all walks of life have been deeply affected by data.A major problem associated with large amounts of data is data quality.Due to the constraints of various conditions,such as transmission conditions,acquisition conditions,historical conditions,input errors and system failures,the data will be in the missing or contradictory.In addition,the sources of data are increasing,ranging from all kinds of sensors t o various systems.These problems lead to the difficulty in making use of these data.In order to improve the availability of data,it is a very common method to integrate data through schema transformation and take measures to improve data integrity.The existing technology is not widely used in pattern matching.It only considers some characteristics of relational schema,and lacks effective strategies to integrate various characteristics of relational schema.At the same time,the existing missing value processing technology needs to restrict the data,not be able to handle all kinds of data well,and cover every attribute of the relational schema in an all-round way.It is the core research content of this article that how to automatically transform the pattern in a general way and fill the missing parts of the original data so that the two processes can be closely combined.According to the characteristics of relational schema,a general pattern matching algorithm based on weighted scoring mechanism is p roposed in this paper.The algorithm considers all sides of relational schema,and considers these factors synthetically through optimal weight learning,so it has high accuracy.The effectiveness of the algorithm is verified by experiments.In order to improve the quality of data in the process of schema transformation,this paper proposes a prediction model based approach.The prediction model uses the feature selection algorithm proposed in this paper to select the appropriate attributes,and the attributes are vectorized and extended,and the feature is compressed and de-noised through the neural network based self encoder.Then the data of the model is used as the training set,and the prediction model is built for each attribute of the model.Facing the lack,it is worthwhile to complete by the prediction model,and finally transform the data with the above matching algorithm.The method of dealing with missing values based on the prediction model fully considers the semantic relevance between each attribute.And based on the feature directed quantization method proposed in this paper,it can handle various types of attributes.It is a novel and universal algorithm.The validity of the prediction model is verified by experiments.Finally,a prototype system is designed.The system integrates the relationship pattern matching algorithm proposed in this paper and the pattern conversion algorithm with quality assurance algorithm,to provide the service of pattern conversion,which shows the application of the algorithm.
Keywords/Search Tags:pattern conversion, data quality, pattern matching, relational pattern, neural network
PDF Full Text Request
Related items