Font Size: a A A

The Research And Application Of Automatic Text Classifier Based On Support Vector Machine

Posted on:2013-07-04Degree:MasterType:Thesis
Country:ChinaCandidate:Y Q HuangFull Text:PDF
GTID:2248330371981090Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Automatic text categorization is a process of automatically making text attribution type with text mining and analysis based on the text classification system, With the rapid growth of digital information, more and more areas need to be applied to automatic text classification techniques, such as digital libraries, electronic conferencing, information retrieval and filtering, text classification, How to construct an effective automatic text classifier is a hot research in the field of machine learning.Support vector machine is a machine learning method,and is developed based on statistical learning theory of structural risk minimization principle and the VC dimension, transformed into a quadratic programming problem to obtain a global optimal solution, can effectively solve the problem of small samples, nonlinear and high dimensional pattern recognition, And effectively overcome the curse of dimensionality,have good learning and generalization ability. Text categorization has characteristics of sample vectors sparse, high dimension, and the training set is larger, the" noise" samples are more, generally not linear separable in the low dimensional features space. This paper presents that combine the support vector machine and automatic text classifier, in order to solve the automatic text categorization problems of dimensions huge, linearly nonseparable and low classification performance.This paper studies the technology of text classification and support vector machine algorithm, according to the classification process, makes the input text segmentation, removes the useless entry in the text, then makes the relevant data of entry statistical in the text, uses chi-square statistics method to select features, the TF-IDF method to calculate the feature weight, then according to the feature and the weight data of text, makes the text representation with a vector model, finally, combines the advantages and characteristics of different kernel functions, selects Gauss kernel function having powerful pushing and local strong, set the σ parameter for0.4,and C for100, in the feature vector space uses support vector machine algorithm for classification function of training samples, gets the text classification model.At the same time, this paper constructs a optimization training method of SVM classifier:according to the training and test sets having complementary, through recalling support vector sample of training set and the misclassified text in the test set. consists of a training set learning ability and generalization ability is relatively strong to train SVM classifier, experiments show that the method improves the recall and precision of the text classification, proves that this method has certain feasibility and practical significance.This paper develops a prototype of support vector machine automatic text classifier, and from the Sogou text classification corpus collects data sets, consists of experimental text set. Through the experimental test and analysis comparison, achieves higher recall and precision, proves the validity of the algorithm design.
Keywords/Search Tags:Automatic Text Categorization, Support Vector Machine, Feature selection, Kernel function, Classifier
PDF Full Text Request
Related items