Font Size: a A A

Research And Implementation Of Ontology-Based Web Information Extraction System

Posted on:2008-05-01Degree:MasterType:Thesis
Country:ChinaCandidate:W ZhaoFull Text:PDF
GTID:2178360248952206Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of Internet, the Internet has become one of the most important knowledge repositories. It is highly desirable to achieve efficient information extraction. It has become an important research issue of how to offer efficient information automatically from Internet to the users. The information extracted by IE (Information Extraction) systems not only can provide for the end user, but also is the first step to build an intelligent query system and a data mining system. The IE system has a nice prospect, and the research on IE technique becomes the focus of Natural Language Processing internationally.In this paper, it first introduce the Information Extraction technology and its developing background and history. It analyse the system architecture, the taxonomy of Information Extraction and the key technology and weighing measure of Information Extraction. And this paper also introduce the basic knowledge of ontology.Based on this, this paper present a new approach to extracting information from normal document based on an application ontology that describes a domain of interest. In this approach it combine the Information Extraction with ontology. It first use the concepts, relations and keywords of domain ontology to generate Information Extraction rule automatically and then do grammar parsing on the document. After that it use the result of grammar parsing and Information Extraction rule to do information extraction on document and at last output the result as a list of records.In this paper, according to the approach and engineering reality condition, it designed an Ontology-based Web Information Extracton System and wrote some codes and implemented the system, so in this paper, it introduce the main frame and the designing method of main modals in detail. Because this system use the ontology to extract information, so this paper focus on how to parse OWL ontology with DOM and a new ontology storage schema is designed according to characteristics of OWL ontology's Classes and Properties.This paper also introduce the way of implementing the system which includes data structure, flow chart etc. At last this paper show the result which it got from the processing of this system using some test documents and analyse the extraction result.
Keywords/Search Tags:Information Extraction, ontology, OWL
PDF Full Text Request
Related items