Font Size: a A A

Keyword Search in Relational Databases

Posted on:2011-02-23Degree:Ph.DType:Thesis
University:The Chinese University of Hong Kong (Hong Kong)Candidate:Qin, LuFull Text:PDF
GTID:2448390002958733Subject:Engineering
Abstract/Summary:
Keyword search in relational databases (RDBs) has been extensively studied recently. A keyword search (or a keyword query) in RDBs is specified by a set of keywords to explore the interconnected tuple structures in an RDB that cannot be easily identified using SQL on RDBMSs. In brief, it finds how the tuples containing the given keywords are connected via sequences of connections (foreign key references) among tuples in an RDB. Such interconnected tuple structures can be found as connected trees up to a certain size, sets of tuples that are reachable from a root tuple within a radius, or even multi-center subgraphs within a radius. In the literature, there are two main approaches, namely schema-based approaches and graph-based approaches. The schema-based approaches are to generate a set of relational algebra expressions and evaluate every such expression using SQL on an RDBMS directly or in a middleware on top of an RDBMS indirectly. Due to a large number of relational algebra expressions needed to process, most of the existing works take a middleware approach without fully utilizing RDBMSs. The graph-based approaches are to materialize an RDB as a graph and find the interconnected tuple structures using graph-based algorithms in memory.;In this thesis, for the schema-based approaches, we propose an efficient algorithm to general all relational algebra expressions in order to find all the connected trees in an RDB. We also study an efficient algorithm to evaluate all the expressions using semijoins in RDBMS . We show that our method can also be extended to answer continuous keyword queries in a relational data stream. We further propose novel algorithms that find sets of tuples that are reachable from a root tuple within a radius, and algorithms that find multi-center subgraphs within a radius. Our algorithms use SQL queries only in order to make fully use of RDBMS. We show that the current commercial RDBMSs are powerful enough to support such keyword queries in RDBs efficiently without any additional new indexing to be built and maintained. The main idea behind our approach is tuple reduction. For the graph-based approaches, we propose an efficient algorithm to find all/top- K multi-center subgraphs in polynomial delay. We also introduce a new kind of keyword query, namely, structural statistics by keywords, to summarize keyword search results into several dimensions. We conducted extensive performance studies using two large real datasets IMDB and DBLP to show the efficiency and effectiveness of all our approaches.
Keywords/Search Tags:Keyword, Relational, Approaches, RDB, Interconnected tuple structures, Using
Related items