Font Size: a A A

Domain Ontology Construction And Applied Research In The Web Information Extraction

Posted on:2011-10-08Degree:MasterType:Thesis
Country:ChinaCandidate:C HuangFull Text:PDF
GTID:2208360302970050Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Information extraction is an important development direction of Natural Language Processing skill. The purpose of information extraction is to make useful information together in a uniform manner, so that benefits for helping people access to information. As a natural language processing system, information extraction system requires a strong knowledge library's support. Because in different information extraction system, knowldege library structure and the content are different, which making information extraction technology faced with the problem of the knowledge bottleneck. As the common knowledge of a special domain, ontology could provide the necessary information of semantic annotation. By introducing ontology to information extraction system, it's helpful for information extraction system to understand united the concepts and the relationship between concepts in the domain, so as to provide more valuable information to users. This paper takes the domain ontology as the study object, and the study about the construction of the domain ontology and its application in the information extraction system are developed as following:Firstly, this paper analyses and studies the domain ontology's application actuality in the information extraction system and the research actuality of the methods of the ontology construction,and establish the research target of constructing and applying the domain ontology in the information extraction system by taking advantage of ontology semantic superiority.Secondly, this paper proposes a method of constructing domain ontology which comprises the confirmation of the domain, the extraction of domain-specific concepts and the relationship among the concepts as well as the edition and storage of the ontology. In the process of acquiring concepts of constructing the ontology semiautomatically, we'd like to get the key words of the domain after mining domain texts, then apply an improved TF-IDF formula to extract domain-specific words from the key words collections and get ontology concepts after manually modifying the domain-specific words. Relations between the concepts are extracted by the approach based on WordNet and the pattern learning method. Finally, we edit and obtain the domain ontology by the tool of Protégé.Thirdly, we construct a small domain ontology based on the information extraction platform in the field of mobilephone, then combine two technologies the Ontology and Information Extraction and propose an text information extraction algorithm based on OWL ontology. In the algorithm, ontology as the knowledge frame of a domain is consulted. The aim of the algorithm is to extract structured instances of the frame which should be composed of OWL Ontology's semantic elements such as classes, properties and individuals to depict the extracted text information. At last this paper showed the result which was got from the processing of using this algorithm carried on the extraction to some handset domain sample homepage and analyses the extraction result.
Keywords/Search Tags:Information extraction, Domain ontology, Ontology construction, Mobilephone
PDF Full Text Request
Related items