Font Size: a A A

The Research Of Text Classification Based On Concept

Posted on:2009-12-16Degree:MasterType:Thesis
Country:ChinaCandidate:M M JiangFull Text:PDF
GTID:2178360242489404Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
The proliferating web information makes people get in trouble with finding what they want. However, traditional artificial means, which a huge set of original document is organized and managed by, are not only time-consuming and laborious, and also have bad classification effect. Therefore, as a technology for information organization and management, automatic text classification is brought forward to resolve disorderly and unsystematic phenomenon in information retrieve and has been widespread concern and great development.In this paper, a text representation method based on concept is proposed, and has been applied to text classification in order to solve the problem which the semantic relation is not considered in the vector space model (VSM). And we also implement this text classification system.Firstly, we analyze various ways how natural language deals with semantic layer, and the status quo and development trend of text classification in virtue of WordNet.Secondly, we introduce several important stages of the text classification based on traditional VSM, and focus on the text representation relevant technologies and two classical classification algorithms.Then, this paper eliminates ambiguity of word meanings in text by WordNet. A representation of text based on concept is proposed later, and has been also applied to classification in SVM and KNN.Finally, we make two groups of experiment, which are comparative experiments about a text representation method based on concept (CVSM) and VSM in Reuters RCV1 news text collections. The results show that precision rate, recall rate and F1 measure of the former are all higher than the latter, which shows that CVSM achieves a better classification performance.At the same time, we also make comparative experiment of SVM and KNN in order to verify their classification performance over the same data sets and text representation. The experiment shows that the SVM achieves better performance than the KNN.
Keywords/Search Tags:Text classification, WordNet, Concept Vector, SVM, KNN
PDF Full Text Request
Related items