Research On Ontology-based Knowledge Base Classification

Posted on:2014-02-10

Degree:Doctor

Type:Dissertation

Country:China

Candidate:C Y Zhu

Full Text:PDF

GTID:1228330398972872

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Linguistic knowledge base is a fundamental resource for natural language processing. The completeness, representation and organization of the knowledge directly affect the application performance of the knowledge-based natural language processing.Most of the taxonomy-based knowledge bases were constructed on the basis of human-oriented dictionary, they have a low converge and a long updating cycle, and the isolated storage strategy for domain knowledge bases is hard to meet the need of knowledge sharing and redundancy reducing.On the other hand, many of the existing natural language processing applications only involve the word level knowledge, and rare of them use the semantic knowledge about the concepts and the relationships among concepts, which limits the applications’performance.To solve the problems mentioned above, this paper proposed a domain label assignment method based on the manually constructed machine readable dictionary, which can be used to automatically implement the domain dictionaries. By using the well-defined taxonomy and formal description of concepts in ontology, we can improve the performance of knowledge storage, representation and sharing for existing knowledge base. The main works of this dissertation are summarized as follows:1. A word domain assignment method based on the word gloss is proposed. The domain specialized dictionaries and a general dictionary are used in this method to train the label model which is then used to automatically add domain labels to the new word in the general dictionary. This method can effectively reduce the labor cost while improving the coverage of knowledge base2. An adaptive hierarchical classification system generation method is proposed in chapter3, and a hierarchical domain assignment method based on the automatically generated classification system is also proposed in this chapter. The method utilize the vocabulary information to analyze the relevancy between domains, and on the basis a hierarchical classification tree is automatically generated and then be used in the top-down hierarchical domain label step. 3. A new conceptualized feature description model C-VSM based on ontology is proposed, in order to resolve the polysemy and synonyms problems in domain terminology. We make word sense disambiguation on polysemy and merge the synonyms by mapping the word in text to the concept node in ontology to reduce the number of features and increase the weight of main features, which can improve the efficiency of text representation. The training documents and the new documents are represented as C-VSM and then be used in the traditional classifier.4. We introduce the C-VSM model into text classification, and discuss the related technologies including feature selecting method, feature weighting calculation, text similarity calculation and so on. A new balanced feature selecting method is presented by combining the information gain and the document frequency, to promote the classification performance. And the feature weight is adjusted to improve the text similarity calculation by analyzing the semantic relation between concepts.

Keywords/Search Tags:

ontology, knowledge base, domain assignment, hierarchicalclassification, text classification, conceptualization, featureselection, feature weighting

PDF Full Text Request

Related items

1	Short Text Classification Method Combining Statistical Information And Conceptual Information Of Knowledge Base
2	Research On The Methods Of Domain Semantic Knowledge Base Construction And Knowledge Service
3	The Research On Conducting Chemical Domain Text Classifier Based On Hownet
4	Ontology-based Construction Of Hierarchical Categorization System Of Domain Knowledge Base
5	Research On Text Classification Based On Feature Selection And Feature Weighting Algorithm
6	Domain Knowledge Base Constructing And Integrate Realization Based On Ontology
7	Short Text Classification Based On Integration Of Ontology And BTM Feature Extension
8	Research On Text Classification Based On Domain Ontology
9	Research On Chi-square Statistic Feature Selection Method And TF-IDF Feature Weighting Method For Chinese Text Classification
10	Research On Construction And Application Of Ontology Based Knowledge Base In Inner Mongolia Autonomous Region's Tourism Domain