Font Size: a A A

Large scale information integration on the Web: Finding, understanding and querying Web databases

Posted on:2008-03-17Degree:Ph.DType:Thesis
University:University of Illinois at Urbana-ChampaignCandidate:Zhang, ZhenFull Text:PDF
GTID:2448390005468056Subject:Computer Science
Abstract/Summary:
The Web has been rapidly "deepened" by myriad searchable databases online, where data are hidden behind query interfaces. Guarding data behind there, such query interfaces are the "entrances" or "doors" to the deep Web. To open this door to the deep Web, we have been building the MetaQuerier system---for both exploring (to find) and integrating (to query) databases on the Web through their query interfaces. To find Web databases, we need to provide search functionalities that dynamically discover databases relevant to user's information needs. To query those Web databases, we need to "understand" what a query interface says---i.e., what query capabilities a source supports through its interface, in terms of specifiable conditions. Further, to help users query "alternative" sources, we need to mediate heterogeneous query capabilities across different sources discovered on-the-fly. Finally, to process queries submitted to a database, we need to design efficient query processing techniques. To address those challenges, this thesis presents several key components in MetaQuerier system: First, a search facility searches for useful databases by their schemas; Second, form extractor extracts query capabilities of databases by applying a best-effort parsing approach based on hidden syntax; Third, form assistant translates queries across pairs of interfaces on-the-fly by deploying a light-weight, domain-based translation framework. Fourth, OPT* framework processes ranked queries by a k constraint optimization problem. We evaluate our techniques upon real databases on the Web. The experiment results show the promise of our system.
Keywords/Search Tags:Web, Databases, Query
Related items