Font Size: a A A

Research On Duplicate Detection And Cleaning Of Uncertain Data

Posted on:2013-06-06Degree:MasterType:Thesis
Country:ChinaCandidate:H T DengFull Text:PDF
GTID:2248330362470864Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Recently, the management of uncertain data, especially in many emerging areas, such as wirelesssensor networks, biotechnology and biological databases, location-based services and data stream, hasattracted participants from industry and academic of great interest. In order to improve the accuracy ofthe information of uncertain database, this paper studies the duplicate detection and data cleaning ofuncertain data based on the results of previous studies.Firstly, on the basis of previous research results and the study of uncertainty theory, the paperpresents an improved model for duplicate detection of uncertain data, then introduces the concepts ofpriority weight and attribute threshold. In order to improve the efficient of duplicate detection, thesimilarity of alternative tuple with high probability is calculated preferentially. Secondly, this thesisdoes these works below around attribute uncertain data cleaning. A kind of metric based on entropyfor measure answer quality for probability range query is presented. In order to improve the queryquality under limited resource, the author develops an optimal and approximately optimal solution.Also, we extend our solution to the situation that the query budget for the scenario of multiuser sharequery resource. Then, this paper shows the works for tuple uncertain data cleaning. A kind of metricbased on possible world semantics is defined for probability range query and min value query(PWS-EQ). Also, we discuss methods for efficient evaluating PWS-EQ and design anoptimal quality cleaning algorithm in polynomial time. At last, we study efficient algorithm forre-evaluating queries on cleaned database. For the solutions and algorithms proposed, experimentswith detailed analysis are made respectively.
Keywords/Search Tags:Uncertain Data, Probabilistic Query, Duplicate Detection, Query Answer Quality, QueryResource Budget, Data Cleaning
PDF Full Text Request
Related items