Font Size: a A A

Domain Concepts Automatically Extracted

Posted on:2011-10-10Degree:MasterType:Thesis
Country:ChinaCandidate:X M YaoFull Text:PDF
GTID:2208330332977892Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
The power of ontology is increasingly greater in the domain of Semantic Web, Information Extraction and Knowledge Management, etc. How to construct domain ontology automatically or semi-automatically is becoming an urgent task. Concept is the most important and basic modeling primitive in ontology, the construction of Relation, Instance, Function and Axiom of ontology is based on it.. It becomes a great challenge to researchers of extracting domain concept efficiently from large domain corpus automatically or semi-automatically. To address this issue, this paper has studied and done research from three respects:Domain Term Extraction, Concept Connotation Acquisition, Concept Extension (Instances) Studying.1. Domain Term Extraction. Firstly, rule-based approach is adopted to extract potential terms. Secondly, mutual information is adopted to analyze the tightness of internal integration, and domain words with high frequency are extracted from character strings. At last, words co-occurrence is used to analyze the correlation of high frequency words, so as to get the true domain terms. One of the innovations of this paper is that words co-occurrence is introduced to analyze the correlation of high frequency words, such as instances, so that domain terms could be automatically extracted.2. Concept Connotation Acquisition. This paper utilized HowNet which is a repository of common knowledge, and sememes are used to descript concept, so that concept connotation could be automatically acquired. For some unknown words problem, this paper adopted split strategy to get these sememes separately, so as to automatically acquire the sememe of unknown words. Besides, for the problem of repeated terms, this paper adopted k-means clustering technology to merge them, the concept similarity is referred to its sememes.3.Concept Extension (Instances) Studying. According to the inadequacy of rule-based and statistical based method, this paper introduced Machine Learning method into the concept instances studying process, and Support Vector Machine (SVM) was tried to study concept instances, the validity of this method has been proved by the experimental results.This paper analyzed the researching state of ontology concept extracting and its existing problems, and put forward our own ideas and had done experiment in three steps:. Domain Term Extraction, Domain Concept Connotation Acquisition, Domain Concept Extension (Instances) Studying, the validity has been proved by the experimental results.
Keywords/Search Tags:Domain Term Extraction, Word Co-occurrence, Domain Concept Connotation Acquisition, Domain Concept Extension (Instances) Studying
PDF Full Text Request
Related items