Information extraction and integration for Web databases

Posted on:2005-09-15

Degree:Ph.D

Type:Thesis

University:Hong Kong University of Science and Technology (People's Republic of China)

Candidate:Wang, Jiying

Full Text:PDF

GTID:2458390008493432

Subject:Computer Science

Abstract/Summary:

A large number off the Web pages returned by filling in search forms are not indexable by most search engines today since they are dynamically generated by querying a back-end (relational or object-relational) database. Referred to as Web databases, such Web sites usually contain complex data objects with nested structures in their Web pages. In this thesis, we address a variety of problems related to retrieving information from Web databases. To extract structured data embedded in template-generated pages from Web databases, we first develop an algorithm to automatically identify the data-rich sections in the page and then propose an innovative approach to automatically induce regular-expression wrappers from them. To understand the semantics of both the query interfaces and the extracted data from various Web databases and integrate them, we propose a combined schema model to describe differentiated schemas in a Web database (global, interface and result schema). We then address two significant schema-matching problems for Web databases, intra-site schema matching and inter-site schema matching, and investigate an instance-based method using domain-specific query probing to solve the two problems at the same time.

Keywords/Search Tags:

Web, Schema

Related items

1	Research And Implementation On Schema Exchanging Between XML Schema And Relation Schema
2	Research On Data Integration And Exchange Technology Of The Agile Virtual Enterprise
3	The Research And Implementation Of Translating Relational Schema To XML Schema Preserving Semantic Constraints
4	Research On Technology Of Schema Matching Between Global Schema And Local Schema
5	Research On Publishing XML Documents From Enterprise Database
6	A Study Of Coding Index Based On Schema
7	Research And Implementation On Schema Transformation And Query Of XML In Data Integration
8	Research On Schema Theory And Its Application In Control Of A Class Of Underactuated Mechanical Systems
9	Multidatabase System Integrating Platform CMDatabase
10	Research On Key Technologies Of Deep Web Data Integration