Font Size: a A A

A framework for transparently accessing deep web sources

Posted on:2011-08-28Degree:Ph.DType:Dissertation
University:University of Illinois at ChicagoCandidate:Dragut, Eduard ConstantinFull Text:PDF
GTID:1448390002967186Subject:Information Technology
Abstract/Summary:
An increasing number of Web sites expose their content via query interfaces, many of them offering the same type of products/services (e.g., flight tickets, car rental/purchasing). They constitute the so-called "Deep Web". Accessing the content on the Deep Web has been a long-standing challenge for the database community. For a user interested in obtaining information about products from alternative Web sites it is a daunting task to manually access them. Providing a uniform access to these sources is therefore of practical importance as it facilitates users to search and compare services/products of multiple providers. We aim to construct an integrated system that makes the access to individual sources transparent to users. To achieve this goal a number of problems need to be addressed. First, for a certain domain of discourse (e.g., real estate) a uniform query interface to the data sources has to be constructed. Second, a query formulated on the integrated interface needs to be translated into queries against interfaces of specific sources. Last, returned data by individual sources needs to be correctly extracted and the results ranked in descending order of desirability (e.g. price).;In this dissertation I present a technique to extract Web query interfaces into a hierarchical representation and argue why the hierarchical representation is desirable in practice. I will also present a technique for constructing integrated Web query interfaces. Finally, I will discuss directions for future research: e.g., constructing meta-search engines over search engines that advertise spatial data, such as restaurants.
Keywords/Search Tags:Web, Sources, Query interfaces, Access
Related items