Font Size: a A A

Data preparation for Web ontology extraction

Posted on:2001-09-13Degree:M.SType:Thesis
University:The University of Texas at ArlingtonCandidate:Tan, Keng-WoeiFull Text:PDF
GTID:2468390014955295Subject:Computer Science
Abstract/Summary:
The explosive growth of data on the web makes information management and knowledge discovery increasingly difficult. Applying database techniques to manage web information can help in solving these problems. One difficulty encountered is that web documents, unlike structured databases, contain unstructured and semi-structured data. Our hypothesis is that creating ontologies is the key to bridging the gap between semi-structured data and structured databases, and hence facilitating the application of database techniques.; We capture the ontology by extracting and analyzing data from HTML documents in an application domain. We utilize the HTML tags to extract raw data, and use WordNet to preprocess the data into concepts that represent the domain entity types, attributes, and relationships. These concepts will be analyzed to construct the ontological schema for the domain. This thesis focuses on heuristics for cleaning the raw information extracted from web, and reducing it to candidate concepts for the ontological schema.
Keywords/Search Tags:Web, Data, Information
Related items