Font Size: a A A

Classification and relation extraction for semantic Web annotation

Posted on:2006-08-31Degree:M.C.ScType:Thesis
University:Dalhousie University (Canada)Candidate:Satti, Asad RFull Text:PDF
GTID:2458390008973341Subject:Computer Science
Abstract/Summary:
The idea of the Semantic Web (SW) is based on metadata and the future usage scenarios of the SW assume the availability of metadata needed by agents and computer programs to perform sophisticated tasks. We have proposed an architecture needed to generate metadata for the SW. Our proposed architecture is designed to generate metadata from HTML Web Pages from the domain of our interest. The domain model is represented by using a domain ontology. The focus of this thesis is to investigate the automatic generation of metadata for the SW by using classification and relation extraction techniques.; The classification module takes a domain ontology and Web Pages as an input and classifies Web Pages into ontology classes. The performance of several classification algorithms is explored on web pages of the Four Universities dataset using page text, with a limited-size feature set. Results show that K-NN is the best classifier in case of biased attribute selection with 97% average F-Measure score over all the classes and RIPPER is the best in case of unbiased attribute selection with 49% average F-Measure score over all classes and 60% average F-Measure score over four classes that show the best performance.; For relation extraction our assumption is that any two Web Pages that have some kind of relationship must be connected by a link or a path. Our algorithm exploits the link structure of Web using breadth first search for relation extraction. Relation extraction results show that the hyperlink structure of the Web can be used for relation extraction.
Keywords/Search Tags:Relation extraction, Semantic web, Average f-measure score over, Web pages, Metadata, Results show
Related items