Font Size: a A A

Top-K Keyword Search For Uncertain Data

Posted on:2014-04-01Degree:MasterType:Thesis
Country:ChinaCandidate:Z ZhangFull Text:PDF
GTID:2208330434972973Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
The problems of keyword search and uncertain data management have been considered extensively, however addressed in isolation in the past. This paper introduces a novel method that combines IR-style keyword query to uncertain data structures, which facilitates the information retrieval uncertain data for users without any knowledge of SQL-like structured query languages.Firstly, this paper discusses and aims at two different uncertain data structures, i.e.(1) relational databases and (2) XML schemas. Many uncertain data problems are based on these two models.Then, for these two models, this paper proposes the top-k keyword query definitions and underlying rationales. For (1), the top-k algorithm performs keyword search query on attributes level of relational databases, the k query results of which have maximum rank scores. Rank score of a query result is well-defined, depending on its correlation with query keywords and its possibility under the possible world. An optimized algorithm is introduced to promote effectiveness in the top-k query. For (2), the probabilistic XML data is viewed as a labeled tree, and a concept of Minimum Meaningful Fragment (MMF) is defined as the searching result. A MMF is a minimum subtree of the probabilistic XML data which has a positive probability of containing all keywords. To sort the MMFs a novel scoring function mainly considering the degree of uncertainty information is presented. Due to the high complexity of calculating such ranking scores, several strategies are proposed to improve the performance.At last, the extensive experiment which judges the performance and pruning capability under various k values, numbers of keywords, and distributions probability of uncertain data, etc., demonstrates the practicality and efficiency of these methods.
Keywords/Search Tags:uncertain data, relational database, XML schema, top-k keyword search, possible world
PDF Full Text Request
Related items