Font Size: a A A

The Design And Application Of SSVM's Text Classification Based On Feature Selection Optimization

Posted on:2017-01-21Degree:MasterType:Thesis
Country:ChinaCandidate:C XieFull Text:PDF
GTID:2348330536453391Subject:Engineering
Abstract/Summary:PDF Full Text Request
The development of society,the advancement of technology and the accumulation of knowledge result in a growing number of text data.Therefore we are facing the problem that how to effectively extract useful information from the massive text data.Text classification can provide strong support for the further data query and data analysis.It is useful to adopt the machine learning algorithms in text classification when computer handle the text data.Support vector machine algorithm is one of the great performance algorithms among all the machine learning algorithms due to its high dimension,kernel function and strong generalization ability.Smooth support vector machine algorithm is a improvement algorithm of Support Vector Machine Algorithm,which is mainly concerned in the thesis.This thesis first analyzes the Chinese text features and feature extraction problems in text classification.Chi square value is usually used to select the feature words,which solved the problem that the application of space vector model of texts requires the use of high-dimension feature space.This thesis point out the problems that the Chi square algorithm does not consider the factor of word frequency and the unbalance that the fewer words distribute the higher Chi value it gets.To solve the first problem,we proposed a word-frequency factor,to solve the second problem we proposed a few-words-balance factor,and analyze the improvement of the optimization in detail.The second work of this thesis is we implement the smooth support vector machine algorithm in text classification,including design the process of preprocessing,data training and data testing in detail.At last,we compared the smooth support vector machine's and support vector's performance in text classification,and the effect of square root value optimization.Results show that smooth support vector machine works better than support vector machine in classification's accuracy and speed,and the optimization of features extraction methods can improve the accuracy of the classification results.Experiment results are accord with the optimization design of the theory.
Keywords/Search Tags:Text Classification, Smooth Support Vector Machine, Feature Selection
PDF Full Text Request
Related items