Research On Chinese Text Categorization Based On Support Vector Machine

Posted on:2010-08-11

Degree:Master

Type:Thesis

Country:China

Candidate:F Jiang

Full Text:PDF

GTID:2178360278962391

Subject:Computer system architecture

Abstract/Summary:

PDF Full Text Request

With the rapid development and the increasing popularity of the Internet,more and more information by way of electronic documents exist in the Internet. How to extract valuable knowledge from the massive potential documents has become a major information-processing goal. As an important aspect in the field of informationâ€“processing, text categorization has become a major research direction. Using text categorization techniques, documents can be automatically dealt with in accordance with the classificatory of organizations and to facilitate accurate positioning of the people the information needed. At the same time, as information filtering,information retrieval,search engines and other areas of technology infrastructure, text categorization techniques have broad application prospects.Categorization algorithm is the most critical factor to text categorization system performance. Support vector machine is a new machine learning technique developed from statistical learning theory by Vapnik. Support vector machine is widely investigated and used for text categorization because of its good generalization performance, the global optimum and simple structure.In this paper, we research on the text categorization problem and carry out depth research on support vector machine kernel function. After analyzing the traditional polynomial kernel function, for the polynomial kernel's poor study performance, we combine conditionally positive definite kernel which it has high study performance with Polynomial kernel as an improved polynomial SVM classifier for text categorization. In this paper to do the following work:â‘ Discuss some of the key techniques in text categorization field: text feature selection algorithm, feature weighting and categorization algorithms. Compared the advantages and disadvantages commonly used feature selection algorithm and categorization algorithm in text categorization.â‘¡Introduce a kind of kernels which are not satisfied with Mercer conditions, but it can be used for kernel study. Analyzed the advantages and disadvantages of such conditionally positive definite kernel and use it in text categorization field.â‘¢Analysis of the characteristics of polynomial kernel function, for the polynomial kernel function's poor learning performance, we use conditionally positive definite kernel which has good study ability constitute a mixed kernel function as a means of improving polynomial kernel function. Improved polynomial kernel SVM text classifier not only has good generalization performance, but also has good learning performance, at the same time its structure have the inner contact with the text vector similarity measure.â‘£In order to verify the improvement approach , we use the improved polynomial kernel function and the polynomial kernel function in the same data sets to do a comparison of experiment, the experiment results showed the improved polynomial kernel SVM text classifier is superior to the polynomial kernel SVM text classifier.â‘¤In the course of the experiment ,we found that first- factorial polynomial kernel function and second- factorial conditionally positive definite kernel function in three different data sets has always been the same classify effect, for which a conjecture is proposed in this paper: first- factorial polynomial kernel function as support vector machine's kernel function is equivalent to second- factorial conditionally positive definite kernel function as support vector machine's kernel function.

Keywords/Search Tags:

Support Vector Machines, Polynomial Kernel, Conditionally Positive Definite Kernel, Text Categorization, Feature Selection

PDF Full Text Request

Related items

1	Research On Support Vector Machines Classification Algorithm In Text Categorization
2	Research On Kernel Learning Based On Support Vector Machines
3	Kernels For Feature Extraction And Research On Nonlinear Multiple Kernel Learning
4	A Study On Text Categorization Based On Machine Learning
5	The Research Of Semi-Definite Programming SVM
6	The Algorithm Research And Verification Of Support Vector Machines Based On Different Kernel Functions
7	An examination of KSS for feature selection for text categorization using support vector machines
8	Research Of Kernel Methods For Support Vector Machine And Multiple Kernel Clustering Algorithm
9	Research On Indefinite Kernel Support Vector Machine Algorithms
10	Research On Model Selection Of Support Vector Machine