Domain Concepts Automatically Extracted

Posted on:2011-10-10

Degree:Master

Type:Thesis

Country:China

Candidate:X M Yao

Full Text:PDF

GTID:2208330332977892

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

The power of ontology is increasingly greater in the domain of Semantic Web, Information Extraction and Knowledge Management, etc. How to construct domain ontology automatically or semi-automatically is becoming an urgent task. Concept is the most important and basic modeling primitive in ontology, the construction of Relation, Instance, Function and Axiom of ontology is based on it.. It becomes a great challenge to researchers of extracting domain concept efficiently from large domain corpus automatically or semi-automatically. To address this issue, this paper has studied and done research from three respects:Domain Term Extraction, Concept Connotation Acquisition, Concept Extension (Instances) Studying.1. Domain Term Extraction. Firstly, rule-based approach is adopted to extract potential terms. Secondly, mutual information is adopted to analyze the tightness of internal integration, and domain words with high frequency are extracted from character strings. At last, words co-occurrence is used to analyze the correlation of high frequency words, so as to get the true domain terms. One of the innovations of this paper is that words co-occurrence is introduced to analyze the correlation of high frequency words, such as instances, so that domain terms could be automatically extracted.2. Concept Connotation Acquisition. This paper utilized HowNet which is a repository of common knowledge, and sememes are used to descript concept, so that concept connotation could be automatically acquired. For some unknown words problem, this paper adopted split strategy to get these sememes separately, so as to automatically acquire the sememe of unknown words. Besides, for the problem of repeated terms, this paper adopted k-means clustering technology to merge them, the concept similarity is referred to its sememes.3.Concept Extension (Instances) Studying. According to the inadequacy of rule-based and statistical based method, this paper introduced Machine Learning method into the concept instances studying process, and Support Vector Machine (SVM) was tried to study concept instances, the validity of this method has been proved by the experimental results.This paper analyzed the researching state of ontology concept extracting and its existing problems, and put forward our own ideas and had done experiment in three steps:. Domain Term Extraction, Domain Concept Connotation Acquisition, Domain Concept Extension (Instances) Studying, the validity has been proved by the experimental results.

Keywords/Search Tags:

Domain Term Extraction, Word Co-occurrence, Domain Concept Connotation Acquisition, Domain Concept Extension (Instances) Studying

PDF Full Text Request

Related items

1	A Research On Methods Of Knowledge Acquisition From Domain-Specific Texts And Their Application In Knowledge Acquisition From Archaeological Texts
2	Reserch And Implementation On Semi-Automatic Domain Ontology Acquisition Method
3	Concepts Acquisition From Specific Domain Corpora
4	Research On Domain-Specific Term Extraction Based On Semi-Supervised Learning
5	Construction And Implementation Of Domain Ontology Based On Plain Text
6	The Research Of Domain Term Extraction And Constructed Model Of Semantic Conceptual Graphs
7	Reserch On Domain Ontology Concept And Relation Acquisition
8	A Method Of Domain Compound Concept Extraction Based On Multilevel Filter
9	Research Of Chinese Word Segmentation Oriented To Special Domain
10	The Study Of Domain Dictionary Construction Based On Web