Font Size: a A A

Technologies for keyword search in databases

Posted on:2012-10-15Degree:M.ScType:Thesis
University:York University (Canada)Candidate:Shi, HuxiaFull Text:PDF
GTID:2468390011959371Subject:Computer Science
Abstract/Summary:
Keyword search over databases, popularized by keyword search in World Wide Web (WWW), allows ordinary users to access database information without the knowledge of structured query languages and database schemas. Research in this field falls into three stages: query pre-processing, query answer generation, and query answer presentation. This thesis mainly focuses on the first two stages.;For query answer generation, we observe that most of the previous studies in the query answer generation use Information Retrieval (IR) style ranking, which fails to consider the importance of the query answers. In this thesis, we propose C&barbelow;ollective I&barbelow;mportance R ANKing (CI-R ANK), a new approach for keyword search in databases, which considers the importance of individual nodes in a query answer and the cohesiveness of the result structure in a balanced way. CI-R ANK is built upon a carefully designed model call Random Walk with Message Passing (RWMP) that helps capture the relationships between different nodes in the query answer. We develop a branch and bound algorithm to support the efficient generation of top-k query answers. Indexing methods are also introduced to further speed up the run-time processing of queries. Extensive experiments conducted on two real data Sets with a real user query log confirm the effectiveness and efficiency of CI- RANK.;For query pre-processing, we address the problem of keyword query segmentation, i.e., how to group nearby keywords in a query into segments. This operation can greatly benefit both the quality and the efficiency of the subsequent search operations. Compared with previous work, the proposed approach is based on Conditional Random Fields (CRF), and provides a principled statistical model that can be learned from search history. Extensive experiments on several real datasets confirm the effectiveness of the proposed approach.
Keywords/Search Tags:Search, Query
Related items