Font Size: a A A

Research On Ontology-Based Web Information Extraction Technology

Posted on:2012-05-20Degree:MasterType:Thesis
Country:ChinaCandidate:W T ChengFull Text:PDF
GTID:2218330368458674Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The great popularization of computers and rapid growth of the World Wide Web have much to do with the emergence of web information extraction, which is applied to extract special information from huge numbers of heterogeneous web documents and convert them into unambiguous, structured data to satisfy the needs of further integration and sharing of the web data resources. In recent years, the development of ontology provides a new angle for the research of web data extraction and researchers have carried out in-depth studies on the application of ontology to web information extraction, striving to improve the performance of web information extraction by the usage of ontology.In this article, on the basis of analyzing the features and existing achievements of information extraction technology as well as the related theories in the field of ontology, we made a research on ontology-based web information extraction by fulfilling the following specific tasks:(1) On the basis of analyzing the related theories and applications in the field of ontology, the extraction-oriented ontology model was proposed in order to cope with the thing-descriptive information existing in the web pages, which made a classification on the properties in the ontology model and added a mapping model of the location information to endow the ontology model with the ability of identifying the thing-descriptive information in the web pages.(2) The framework of ontology-based web information extraction system was presented, which adopt a modular architecture to achieve the system's overall function. Besides, there was also a discussion over the realization of the system's framework.(3) The ontology-guided web information extraction method was proposed as the principle of the system's function, which firstly in its rule generation phase imported the location information of the extraction-oriented ontology to guide the identification of the core semantic information block in the web page and created the extraction rules by using the path analysis algorithm, then in its data extraction phase carried out the data extraction according to the path-formed extraction rules and lastly stored the extraction results in the form of RDF ontology to improve the reusability of the extracted information.(4) A contrast experiment was carried out by selecting a certain number of sample pages from the web sites concerning books and vehicles. The experimental results indicate that the system gains a better accuracy according to the outcome of the extraction and performs more efficiently compared with the ruleless extraction method.
Keywords/Search Tags:web information extraction, ontology, semantic, extraction rule, resource description framework (RDF)
PDF Full Text Request
Related items