Data preparation for Web ontology extraction

Posted on:2001-09-13

Degree:M.S

Type:Thesis

University:The University of Texas at Arlington

Candidate:Tan, Keng-Woei

Full Text:PDF

GTID:2468390014955295

Subject:Computer Science

Abstract/Summary:

The explosive growth of data on the web makes information management and knowledge discovery increasingly difficult. Applying database techniques to manage web information can help in solving these problems. One difficulty encountered is that web documents, unlike structured databases, contain unstructured and semi-structured data. Our hypothesis is that creating ontologies is the key to bridging the gap between semi-structured data and structured databases, and hence facilitating the application of database techniques.; We capture the ontology by extracting and analyzing data from HTML documents in an application domain. We utilize the HTML tags to extract raw data, and use WordNet to preprocess the data into concepts that represent the domain entity types, attributes, and relationships. These concepts will be analyzed to construct the ontological schema for the domain. This thesis focuses on heuristics for cleaning the raw information extracted from web, and reducing it to candidate concepts for the ontological schema.

Keywords/Search Tags:

Web, Data, Information

Related items

1	Design Of Data Warehouse & Data Minning Base On Hygienic Manager Information System
2	Sqa Software-based Architecture, Large Enterprise Data Center Design And Realization
3	Based On The Data Large Centralized Model Credit Information Systems Design And Implementation
4	Heterogeneous Data Of The National Field Experiment Station Promotion And Information Dissemination System To Achieve
5	The Research And Implementation Of Information Assets Based On Big Data Analysis Of WEB Information
6	Application Of ETL Technology In Petrochemical Information Heterogeneous Data Based On XHQ
7	Appliance Of Data Ware Technology In Achieving The Preferential Information Aggregation
8	Study And Implementation Of Web Information Gathering And Data Statistics Technology
9	Research And Implementation Of A Stroke-oriented Personalized Health Information Services System
10	Research On Information Hiding Algorithm Of Image And DEM Data