Font Size: a A A

Text Categorization Research Based On Support Vector Machine

Posted on:2008-04-15Degree:MasterType:Thesis
Country:ChinaCandidate:Y H ShiFull Text:PDF
GTID:2178360212974232Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
As a limited-sample learning theory, Statistical Learning Theory (SLT) has many advantages in pattern recognition, such as its superiority in small-sample, nonlinear and high-dimension problems. Support Vector Machine (SVM) is one of the learning methods that developed from SLT. When deal with the small-sample learning, it comes the optimal solution of the limited information and solves many problems such as model selection, overfitting, nonlinear, dimension disaster in high degree. Kernel, which is proposed and developed in the study of SVMs, is a new way of constructing nonlinear map .Good kernels mean good SVMs, so the study of kernel functions is one of the focuses in the research on SVMs.The contributions of this article are as follows:(1)Research contributions and major problems in statistical learning theory study are reviewed .Basic concepts and theories of the SVM are summarized, and problems in the research are put forward.(2)This thesis presents much useful information by lots of experiments, such as text features impact on the final classification, analysis and compares the different performance of text categorization, and choosing kernel's parameters relate to text features. All this can tell us how to create new kernels, choose suitable parameters and improve these existing kernels.(3)In this paper, an improved choosing kernel parameters, base on text features and analysis of these existing approached to optimize kernel parameters, is proposed. The new methods used in Radial Basis function (RBF) kernel and mixed kernels, and the result shows that this methodology can get better performance of classify, especially the generalization ability.(4)Base on sample distributing, an optimization method to deal with the problem of text categorization classifier (RBF kernel) is studied. Then a simplified algorithm is proposed in order to achieve parameter optimization of text classifier .We used this methods to choose RBF kernel's parameters on Data set--Reuters-21578. Good results obtained in experiment studies show the effectivity of improved methods.
Keywords/Search Tags:SVM, text categorization, kernel functions, text features, parameters choosing
PDF Full Text Request
Related items