Font Size: a A A

Keyword search in structured and semistructured databases

Posted on:2005-05-05Degree:Ph.DType:Dissertation
University:University of California, San DiegoCandidate:Hristidis, VagelisFull Text:PDF
GTID:1458390008486349Subject:Computer Science
Abstract/Summary:
Keyword search on documents has been extensively studied by the Information Retrieval (IR) community. However, keyword search is becoming increasingly useful for structured and semistructured databases, due to the popularity of XML and the amount of text stored in databases. Keyword queries free the user from the requirements of knowing the database schema, the role of the keywords and a query language (SQL, XQuery).; Providing keyword search in databases is challenging on both the semantic and the performance levels. We view a database as a data graph, which captures both the relational and the XML model. A result of a keyword query is a subtree of the data graph. The factors used to rank the results are (i) the IR scores of the attribute values of the result, (ii) the structure of the result, and (iii) the authority flow between the result and the keywords through the data graph (inspired by PageRank). We show how these factors interplay and how they can be combined in meaningful ways that allow efficient execution methods.; On the performance level, we present efficient algorithms to produce all or the top-k results of a keyword query. We study two models: the middleware model where the system lies on top of an already operational database system to provide keyword querying, and the dedicated system where we handle the storage of the data and precompute various data to offer real-time response times. The execution techniques are thoroughly experimentally evaluated. Finally, we present a novel technique to present the results to the user.
Keywords/Search Tags:Keyword, Data, Result
Related items