Font Size: a A A

Chinese Text Classification Based On Svm Algorithm Realization

Posted on:2013-01-03Degree:MasterType:Thesis
Country:ChinaCandidate:H X WangFull Text:PDF
GTID:2218330374965170Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
With the rapid development of communication, all kinds of information has grown rapidly, especially text information. In order to pick up valid information from the massive and complicated text timely and accurately, text presentation and automatic text categorization technology have received widespread attention. Text categorization algorithm based on SVM is research focus.In this paper, based on the application of support vector machine(SVM) in the Chinese text classification, we do a comparative study about the effect of the commonly used feature selection methods by the experiment, analyzes the defects of the poor classification accuracy of mutual information feature selection method, and the improved mutual information feature selection method is put forward, the experimental results data show that the improved mutual information feature selection classification accuracy increase obviously.In addition, based on the big training sample number and slow training speed, this article reduced the number of vector from the Chinese text classification process, to speed up the training speed. We use the improved density clustering method to extract the original samples for classification, as a new training sample points set classifier training. And we design and implement a Chinese text classifier based on support vector machine classification method, and its classification result is evaluated with the accuracy and recall ratio. The results show that the classifier has a good classification effect and certain practical value.
Keywords/Search Tags:Text classification, Feature selection, Support vector machine, Density clustering
PDF Full Text Request
Related items