Font Size: a A A

The Study Of Ontology Learning From Web Pages

Posted on:2008-01-04Degree:DoctorType:Dissertation
Country:ChinaCandidate:K FuFull Text:PDF
GTID:1118360242473058Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
Ontology can support information exchange, knowledge sharing and reuse between human and machine, machine and machine, therefore, it gained increasing attention, research and application. However, the scarcity of domain ontology is one of the main bottlenecks which plagued ontology theoretical research and practical application, so ontology learning emerged as the times require. It is able to acquire ontology from various different data sources, using automatically or semi-automatically machine learning methods. Compared to many ontology studies abroad, ontology learning in the Chinese environment is just started off. This dissertation researches on the basic theory and methodology of web-based ontology learning which we hope will provide the fundamental support in theory and methodology for developing a practical Chinese ontology learning system.Based on the existing ontology learning theory, methods and techniques from abroad, the dissertation combines the research fruits of Chinese natural language processing field, studies the theory and methodology of domain concept acquisition, is-a relation learning, attributive relation learning and ontology instance acquisition in the Chinese environment. The detailed work is as follows:(1) General architecture of ontology learning system. A general architecture of ontology learning system is designed, which consists of five functional modules: resource management module, universal resource reading and writing module, data preprocessing module, ontology extraction module and ontology evaluation and editing module. Our proposed web-based ontology learning methods in this dissertation can be seamlessly integrated into the architecture.(2) Multi-strategy domain concept acquisition. This dissertation presents a multi-strategy domain concept acquisition algorithm which combines information extraction technique, Chinese natural language processing technique, linguistics method and statistical analysis, etc. According to the features of page block, our method can self-adaptively choose the term acquisition method which is based on information extraction technique or based on hidden markov model and reduction of NP candidates. The search engine-based method of synonymous relation identification among the terms and the filtering algorithm of domain concept are discussed in detail.(3) IS-A relation learning. This paper proposes two methods for is-a relation learning: method based on the discrimination of web directory and method based on self-learning context. The former includes web directory discriminant algorithm, tagging rules for directory, implicit web directory pattern discovery mechanism, disambiguation algorithm in the process of merging taged documents and mapping rules from web directory to is-a relation. The latter includes research of inheritance context self-learning mechanisms and is-a relationship acquisition algorithm based on inheritance context. Both methods have advantages and disadvantages, are complementary.(4) HowNet-based attributive relation learning. Attributive relations are important, but have been few studied. Firstly, the candidate attributes aggregation is obtained using our methods based self-study context. Secondly, the constitution of candidate attributes aggregation is analyzed and subdivided into three types: non-attribute vocabulary, invalid attribute, and valid attribute. The HowNet-based filtering Algorithms are presented which filter the non-attribute vocabulary using the hyponymous relation described by the attribute sememe and filter the invalid attribute using the attribute-host relation described by the attribute sememe in HowNet. Thirdly, this dissertation studies the basic rules for attribute mapping and attributive relation pruning based on the domain concept tree. The corresponding algorithm is proposed at last.(5) Ontology instance acquisition. Firstly, the main types of individual knowledge in web pages are analysed. Secondly, the ontology-based discriminant algorithms of web theme concepts and characteristics of the individual knowledge representation are proposed. Thirdly, we focus on the design of rules for ontology innstance acquisition from the web tables, including of ontology-instance form's identification rules, attribute unit identification rules, the basic rules and expansion rules for attribute-value unit identification, and instance name recognition rules. In the end, the overall description of the algorithm is described.
Keywords/Search Tags:Ontology Learning, Domain Concept Acquisition, IS-A Relation Learning, Attributive Relation Learning, Ontology Instance Acquisition
PDF Full Text Request
Related items