Research On A Model Of Data Completeness And Evaluating Algorithms

Posted on:2014-11-07

Degree:Master

Type:Thesis

Country:China

Candidate:Y N Liu

Full Text:PDF

GTID:2268330422950589

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the development of modern information technology, information hasincreased sharply, which also introduces poor-quality data, which affects the use ofinformation in digital society badly. Misunderstanding information leads to a greatloss. Therefore data quality has become a severe problem, which brings heateddiscussion to related problems.Handling incomplete data is one of common problems, and how to evaluatedata completeness is one of basic research problems. Reasoning on datacompleteness not only cannot reflect total completeness of a data set, but also needrefer to extra completeness information. Existing methods of evaluating datacompleteness do not take into account some false null values determined by othervalues in a data set, which leads to an underestimated data completeness. Thisdissertation investigates evaluating data completeness, and gives data completenessmodel, suitable for different applications, consisting of three kinds of completeness:attribute value completeness, tuple completeness and relation completeness. Thelatter two can be evaluated by attribute value completeness under a definedcomputing function. With functional dependencies, attribute value completeness canbe truly determined, which contributes to truly relation completeness. Based on thismodel, evaluating data completeness is investigated and formally defined. Differentlower bounds of this problem under different assumption are given, meanwhileexact algorithms reaching these bounds respectively when computing functions aredefined. Approximate algorithms based on uniform sampling are proposed toevaluate data completeness of massive data. Theoretical analysis shows theapproximate algorithms can reach any given precision. Reservoirs are introduced inapproximate algorithms to improve performance on unknown data set withprecision guaranteed. Experiments on real data and synthetic show effectiveness ofthe model and efficiency of proposed exact and approximate algorithms.

Keywords/Search Tags:

data quality, data completeness, evaluating data completeness, uniformsampling

PDF Full Text Request

Related items

1	Accuracy and Completeness as Measures of the Quality of Volunteered Point-Feature Geospatial Data and Evaluation of the Effect of Demographics on that Quality
2	Technology For Answering Queries On Incomplete Data
3	Data Quality Assessment Model And Quality Propagation For Relational Database
4	Research On Data Source Selection Technology For Missing Value Filling
5	Research On Signature-based Data Integrity Evaluation Technology
6	Research On Signature-based Data Consistency Evaluation Technology
7	Result Completeness Guarantee Strategy Studies In Distributed Stream Join Systems
8	Platform Design Of Integrated Management Based On Data Mining
9	Autoreducibility, Nonuniform Completeness, and Random Oracle
10	Evaluation And Correlation Analysis On Information Completeness Of Meta-analyses Abstracts And Full Texts