Font Size: a A A

Research On Top-k Query In Uncertain Database

Posted on:2013-07-18Degree:MasterType:Thesis
Country:ChinaCandidate:X J LiFull Text:PDF
GTID:2248330371972082Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Lots of applications such as data mining、senor network、data retrieval generate a large number of uncertain data which widely used in financial military and other fields. Uncertain data provide imprecise information to users. In some cases it is possible to eliminate the imprecisions completely, but this will lose some important information, uncertain data must be managed and stored effectively.Uncertain database is used to manage uncertain data.Users are more interested in the most important (top-k) query answers in the potentially huge answer space.Top-k query is widely applied in traditional databases, its semantics and the query result both are very clear on precisions data.Uncertain data is unrealicable and imprecise, the uncertain truple has two pillars:confidence and generation rules, tuples’score and uncertainty must be considered in uncertain databases to top-k query.The interplay between sore and uncertainty mankes tradiotional techniques inapplicable. Researches have proposed many top-k query algorithms over uncertain database, these algorithms have different query semantics, and they are not integrated tuples’score and probability value very well, so the query result could not better satify the users’needs. Top-k query on unctertain database needs further study.Firstly, this paper studies and analysis uncertain data and uncertain database, on the base of modeling uncertain data, the author define a novel top-k query semantics which has no ambigutity on uncertain database.The novel top-k query return k tuples for the result, when computing which truple will be at rank i, it will compare the most probility truple ranked at i with the second most probility truple ranked at i-1, the optimal truple will be returned for the final result ranked at i. So the novel top-k query better balance the truples’ score and uncertainty. In addition, according to their different needs, users can define a thres-hold, in query results, all truples’probability is greater than the threshold. The novel semantics ensures that it is well balance the uncertain truples’ score and probability.Secondly, this paper implements the novel top-k query. Data modeling leads to possible word space increasing exponentially, modeling all possible word space will cost lots of query time. So, two kinds of optimization method are used to optimize the algorithm, it avoids the running time and reduces the scan depth of truples to make the algorithm more effcicient.Finally, this paper make some experimentals, experimentals prove that the algorithm is effective on different data sets.
Keywords/Search Tags:uncertain data, Top-k query, uncertain database
PDF Full Text Request
Related items