Research On Text Classification Based On Domain Ontology

Posted on:2013-10-11

Degree:Master

Type:Thesis

Country:China

Candidate:T T Wei

Full Text:PDF

GTID:2248330371489411

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of the amount and the spread of information, the number of documents on the Internet increase exponentially. people are surrounded by huge amounts of information, itâ€™s difficult for people to find the information they are interested in from the Internet accurately and quickly. So how to organize these massive data and classify them in an accurate way has become a significant issue of information technology. Text classification is a key technology of the organization and management for information and help us locate the interested information quickly, so it needs higher and higher requirement.Traditional text classification algorithm uses the key words as features to build vector space model, which keywords are mutually independent, no semantic association, so it lost much of semantic information and canâ€™t express the main content of the text, and affect the classification results. With the emergence of semantic web, semantic-based text classification has become an effective way to improve the traditional method. Ontology as its well-structured and it can express more semantic information, so it is widely used in the semantic text classification. While semantic text classification algorithm has been rapid developed, still faces some problems such as follows: The use of ontology mostly only stay in dictionary level, and the semantic relationship among terms are not deeply research; Concept vector space model didnâ€™t contain the ontology properties and instances, so it canâ€™t express the semantic of the text very well; Most of the algorithms are ignoring the most useful ability of ontologyâ€”reasoning. After full research present situation about the traditional and the ontology-based classification method, this paper propose a method to solve the existing problems, the main work are as follows:(1) This paper introduces the relevance knowledge of ontology and the principle and method of its construction, and the description languageâ€”OWL2. Detailed introduces the process of the construction of the tourism domain ontology. The key technology of text classification process are introduced, including the definition, the representation of text, feature extraction and selection, commonly used classifier, etc.(2) The primary problem of text categorization is text representation model. In order to resolve the problem that existing text representation methods lack semantic information, a new text representation model method was promoted. It bases on concept mapping, not only map to the concept of ontology, but also ontologyâ€™s properties and instances, and fully express semantic relations among terms. Due to the ontology concept contains more semantic information than common terms, the traditional weight calculation method which based on statistical cannot fully express the significance of the concept in the text, therefore this paper proposes an improvement method, which attach more weight to the concept that contain more semantic information.(3) As the computational complexity of the traditional machine learning methods, and they also vulnerable to the influence of the size of training texts. This paper put forward a method which takes the structure of ontology as classification standard, and it is realized by combining the semantic correlation degree of concepts and terms and the ontology reasoning abilities. The text is classified to the ontology concepts as the individuals. Experiments show that this method obtain higher accuracy compare to the Bayes and the KNN classifier method.(4) In order to fully use the ontology in the process of classification, and then improve classification efficiency, the ontology reasoning rules are combined into the classification method. Ontology reasoning mechanism can provide implicit knowledge and semantic information for the classification, so it can reduce the cost of calculation. Experiments show that, combine with the classified method of reasoning rules obtain higher efficiency.(5) This paper based on the background of tourism area, through the crawler grab travel information relevant web pages, and using the proposed calculation method for tourism web text categorization. Each module are the specific process, including preprocess, how the concept vector space model generate, classification process, etc. And then the analysis and summary of the experiments is given.

Keywords/Search Tags:

text classification, semantic correlation, domain ontology, ontology reasoning

PDF Full Text Request

Related items

1	Research On Domain Ontology Model And Semantic Reasoning Of Animation Material
2	Research On Domain Ontology Representation, Reasoning And Integration For The Semantic Web And The Applications
3	Research On Mechanism Of Semantic Association Based On Ontology Of Petroleum Domain
4	Research On High-speed Railway Ontology Integration And Reasoning Based On Semantic Relationships
5	A Research On Semantic Relevancy Computational Method For Text Based On Hypertension Domain Ontology
6	Research On Ontology Reasoning Based On Description Logic In Semantic Web
7	The Design And Implementation Of Content Filtration Model Based On Domain Ontology
8	Domain Ontology Extraction On Unstructured Text
9	Research On Rule-based Reasoning For Domain Ontology
10	The Research Of Ontology Matching Based On Text Classification