Font Size: a A A

Study On Text Classification Based On SVM

Posted on:2008-07-18Degree:MasterType:Thesis
Country:ChinaCandidate:L DaiFull Text:PDF
GTID:2178360212981204Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Text auto-classification is the core technology of information auto-classification based on context. It is the process that text categories are classified automatically by using computer. There are many features about text classification: wide spare of text vector, high dimension, comparatively relation among features. So SVM is very suitable and potential for resolving text classification. Meanwhile, there are lots of tasks that are full of challenging by resolving text classification using SVM. For example, there are too many categories, samples, noises, and classifier speed is slow by using SVM.This paper speeds up the classification process by declining the number of text vectors. This thesis chooses the decisive samples from the original set by using density clustering algorithm, then it uses the decisive samples as new training set to train the SVM classifier. These samples are always the points that distribute around edge, which are called Support Vector in SVM. The target is to find out the samples from the original set.It's not a good way for making use of the common density clustering algorithm directly, because their time complexity is very high, this will cause the total classifying progress efficiency very low. So this article uses an improved density clustering algorithm, this algorithm mixes features of hierachical clustering algorithm CURE. It not only retains the feature that is sensitive to edge point, but also declines the time complexity of density clustering progress. At the same time, it does lots of experiments to find out a method that can dynamically set the initial parameters that will be used in the clustering progress. The way is more efficacious than the old one that must set parameters manually.
Keywords/Search Tags:SVM, Text Classification, Density Clustering
PDF Full Text Request
Related items