Font Size: a A A

Research On Web Information Extraction Based On Domain Ontology

Posted on:2010-10-10Degree:MasterType:Thesis
Country:ChinaCandidate:D R KongFull Text:PDF
GTID:2178360275989376Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
As the development of internet, the network information constantly increase which has already become an important source for people obtain information. However, it is difficulty for people to accurately search information truly wanted on the web due to the structurelessness of web page, the diversity of web content, and dynamic change of webpage. While Web information extraction technologies provide a powerful information acquisition tool, and it make the information expressed by various format be transformed into uniform expression, which solve various problems in web page.The producing background, technical connotation and basic application of information extraction are firstly introduced in this paper. Meanwhile the architecture, key technology and measurement index of information extraction are analyzed. Then importantly introduce the basic knowledge of Ontology including construction and analytic. On the basis of this we propose a web information extraction method based on the domain ontology. This method one hand automatically generate matching model by using the concept, attribute, hierarchical relation of domain Ontology, on the other hand the syntax analysis to text which obtained by web page pretreatment is enforcement, and with the extraction rules carry on the information extraction to the text, last the extraction result output to database in the form of record order to query. The most advantage of information extraction based on domain Ontology is non dependence to the structure of web page. Besides the knowledge database that information extraction is described and expressed by ontology increase the semantic expression capability of extraction model, greatly improve the accuracy of information extraction by storage the important of information extraction into special domain.According to the method above and combining with practical, we design and implement an information extraction system about Computer job hunting letter ontology. The general framework and main modules is described in detail. The concept, attribute, hierarchical relation obtained by analytic Ontology is used for constructing the Ontology model tree, and make the unstructured text obtained by pretreatment carry out information extraction to the awaiting extracting object according to the structure of Ontology model tree.Last introduce and analyze the experience.
Keywords/Search Tags:Domain Ontology, Information Extraction, Matching
PDF Full Text Request
Related items