Font Size: a A A

Research On The Accuracy Of The Data Quality

Posted on:2014-12-07Degree:MasterType:Thesis
Country:ChinaCandidate:Z S YangFull Text:PDF
GTID:2268330422950583Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Along with progress of science and technology and rapid development ofsociety, the size of data set is growing. People are facing great challenge about howto effectively manage the large data sets. Before using the data, people need toknow the accurate data set, so that people make reasonable decisions, therefore theaccuracy of the data set is a very important aspect of the data quality. On one hand,sometimes the data sets’ overall accuracy is low, but one part of the data set is high,and the data people need may be in that part. On the other hand, the diversity of datatypes and data sets is also a huge challenge to the accuracy evaluation.We call the overall accuracy of data set for absolute accuracy, and the accuracyof query result set or part of the data set for relative accuracy. Due to the variety ofquery types, the diversity of dataset representation and the diversity of the datatypes, there is no effective evaluation framework to solve the above problemsconsistent. Existing algorithms are basically concentrated to evaluate one data typeof one data set, inflexible and difficult to apply to real-world problems.For the accuracy evaluation of the data set, this paper presents a systematicframework. Different data sets have basic units, the article is based on the accuracyof the basic unit to compute the accuracy of data set; for different data types, wedefine a new accuracy evaluation criteria based on the relative error. On the basis ofabove, we designed relative accuracy evaluation framework for multi-modal data.Within this framework, we divide the data types into three categories, and developaccuracy evaluation algorithms for each category in cases of in presence andabsence of true values. We also give methods to handle data update and thealgorithm to improve accuracy estimation using function dependency. For relativeaccuracy estimation, the article give the methods to calculate query precision andrecall, we use the query precision, recall, F-measure and accuracy of query result setto indicate the relative accuracy of the query. We also give the methods to improvethe accuracy of query. Finally, extensive experimental results show the e ectivenessand e ciency of our proposed framework.
Keywords/Search Tags:data quality, absolute accuracy, relative accuracy
PDF Full Text Request
Related items