The Research Of Text Classification Based On Ontology

Posted on:2009-07-02

Degree:Master

Type:Thesis

Country:China

Candidate:X Wu

Full Text:PDF

GTID:2178360248452612

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of Internet and the enterprise intranet, all kinds of electronic documents have been growing greatly. It has becoming difficult for users to find information that satisfies their needs, so it has become a great challenge for information science and technology that how to organize and process the massive documents quickly, exactly, and fully. The text classification is a key technology to process and organization massive text data; it can help people to solve the difficulty of information disorder to a great extent and to locate the required information quickly. Therefore, the text classification is gradually getting more and more important.In this paper, we introduce text classification and its related technologies, point out the flaw of traditional text classificationâ€”keywords are mutually independent, don't have the semantic connection, which is opposed with the fact obviously. The documents are related to each other only if there are shared keywords in the documents, such as synonym, hypernym, hyponymy etc. However, the difficulty lies in the fact that most keywords have multiple meaning on the one hand, and on the other hand, some concepts can be described by more than one keyword. In order to solve this problem, we attempt to match keywords with concepts of domain Ontology. Thus we express the document with Concept Vector Space Model (CVSM).The approach can keep the text information mostly and the precision of the text classification is effectively improved. The main works are as follows.1. The key technologies of text classification are analysed in this thesis including the definition of text classification, feature selection, taxonomic approach, and performance evaluation etc. In the end, we point out the existence questions of traditional text classification.2. There are close relations between text classification and personalized information retrieval. The research on personalized information retrieval is discussed and an adjusting algorithm for user profile is presented.3. Ontology is introduced. The keywords are matched with concepts of domain Ontology, thus Vector Space Model (VSM) of document is transformed into Concept Vector Space Model (CVSM) in order to resolve the synonymy, hypernym, hyponymy etc.4. A text classification system architecture based on CVSM is presented and carried on the analysis to each module. A series of simulation tests indicate the precision and recall of text classification are effectively improved.Finally, we carry on the summary to the paper and the forecast to the future work.

Keywords/Search Tags:

Text Categorization, User Profile, Ontology, CVSM, Feature Selection

PDF Full Text Request

Related items

1	The Research Of Text Representation And Feature Selection In Text Categorization
2	Theoretical Analysis And Algorithm Study On Feature Selection For Text Categorization
3	A Study On Text Categorization Based On Machine Learning
4	Normal Weight Based Feature Selection Method In SVM Text Categorization
5	Related Technologies Research On Feature Selection For Text Categorization
6	Feature Selection Methods For Text Categorization
7	X ~ 2 Statistics-based Chinese Text Categorization Feature Selection Method
8	Research On Text Categorization Based On LDA And SVM
9	Research On High-Performance Feature Selection And Text Categorization
10	Application And Research Of Feature Selection Method In Chinese Text Categorization