Font Size: a A A

Heuristic rules for extraction of ontology from Web pages in WebOntEx

Posted on:2001-12-27Degree:M.SType:Thesis
University:The University of Texas at ArlingtonCandidate:Jain, Bhanu ChaturvediFull Text:PDF
GTID:2468390014458406Subject:Computer Science
Abstract/Summary:
The past few decades have brought in an explosive growth of data on the World Wide Web. Browsing and intuitive keyword based searching for retrieval purposes has proved to be insufficient to search and extract all these data easily and in a structured way. This has led many researchers to try to use database techniques to extract information. The Web data is however unstructured (data required by database techniques is structured). This brought in the need to discover the structure of web data to facilitate search, query, and retrieval.; Currently, most people prepare the ontology or the conceptual schema for a particular application domain manually and then extract the actual data. We extract the ontology from a set of web pages in a given domain automatically in our prototype called WebOntEx.; This thesis describes how different web sites were studied to discover the heuristic rules that specify the mapping between informational layout on web pages and meta database constructs. These rules will be programmed into WebOntEx to discover the names of entity types, relationships, attributes, and other constructs that define an ontology. For this thesis two application domains were selected, analyzed and observed to derive generic rules to extract ontology from domain specific web pages.
Keywords/Search Tags:Web, Ontology, Extract, Rules, Data
Related items