Font Size: a A A

The Research Of Text Classification Based On Ontology

Posted on:2009-07-02Degree:MasterType:Thesis
Country:ChinaCandidate:X WuFull Text:PDF
GTID:2178360248452612Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet and the enterprise intranet, all kinds of electronic documents have been growing greatly. It has becoming difficult for users to find information that satisfies their needs, so it has become a great challenge for information science and technology that how to organize and process the massive documents quickly, exactly, and fully. The text classification is a key technology to process and organization massive text data; it can help people to solve the difficulty of information disorder to a great extent and to locate the required information quickly. Therefore, the text classification is gradually getting more and more important.In this paper, we introduce text classification and its related technologies, point out the flaw of traditional text classification—keywords are mutually independent, don't have the semantic connection, which is opposed with the fact obviously. The documents are related to each other only if there are shared keywords in the documents, such as synonym, hypernym, hyponymy etc. However, the difficulty lies in the fact that most keywords have multiple meaning on the one hand, and on the other hand, some concepts can be described by more than one keyword. In order to solve this problem, we attempt to match keywords with concepts of domain Ontology. Thus we express the document with Concept Vector Space Model (CVSM).The approach can keep the text information mostly and the precision of the text classification is effectively improved. The main works are as follows.1. The key technologies of text classification are analysed in this thesis including the definition of text classification, feature selection, taxonomic approach, and performance evaluation etc. In the end, we point out the existence questions of traditional text classification.2. There are close relations between text classification and personalized information retrieval. The research on personalized information retrieval is discussed and an adjusting algorithm for user profile is presented.3. Ontology is introduced. The keywords are matched with concepts of domain Ontology, thus Vector Space Model (VSM) of document is transformed into Concept Vector Space Model (CVSM) in order to resolve the synonymy, hypernym, hyponymy etc.4. A text classification system architecture based on CVSM is presented and carried on the analysis to each module. A series of simulation tests indicate the precision and recall of text classification are effectively improved.Finally, we carry on the summary to the paper and the forecast to the future work.
Keywords/Search Tags:Text Categorization, User Profile, Ontology, CVSM, Feature Selection
PDF Full Text Request
Related items