Font Size: a A A

A Short Text Classification Model Based On Combination Of AdaBoost And SVM

Posted on:2017-12-10Degree:MasterType:Thesis
Country:ChinaCandidate:J H JiaFull Text:PDF
GTID:2428330596457387Subject:Engineering
Abstract/Summary:PDF Full Text Request
Due to the emergence of social networks,people need to face a large number short text information in their work and life.How to obtain the necessary content fastly and precisely has become an urgent problem to be solved.Conventional text categorization can solve these problems to a certain degree,but there are still some disadvantages such as poor classification effect,poor generalization performance and so on.Because the content of short text is small,high-dimensional and sparse features,the traditional classification model can't be good at solving these problems.Therefore,it is necessary to study the short text classification technology deeply.To a certain certain extent,the short text classification model proposed in this paper solves the shortcoming of traditional classification algorithms are not suitable for short text classification.In this paper,the main work is as follows:In the first place,because classification performance and generalization ability of support vector machine in multi classification data sets is not high.Through the introduction of AdaBoost algorithm and the PSO algorithm,this paper puts forward a classification algorithm(referred to as AdaBoost-PSOSVM).Firstly,SVM is optimized by PSO algorithm(referred to as PSOSVM),namely PSO selects the parameters of the support vector machine automatically.Secondly,we can get a strong classifier basing on integrating PSOSVM through AdaBoost.Experimental results show that the strong classifier can improve classification performance to a certain extent,and significantly higher than the PSOSVM algorithm and SVM algorithm classification performance in the UCI data sets separately.In the second place,we propose a feature extraction algorithm(referred to as FEGAX)based on genetic algorithm and?~2 statistics,because genetic algorithm and?~2 statistics has the advantage of easy parallel processing and the preventing local optimal solution.It's basic idea is to use the?~2 statistic to pre-selected the original text sets,and the second feature selection is performed by Genetic Algorithm.This feature extraction method can improve the classification quality of SVM to a certain extent is verified by the experiments.Last but not the least,this paper uses LDA thematic model to extend the features in order to solve the problem that short-text feature sparsity affect classification performance,after the short text pre-processing and FEGAX feature selection.It is the basic idea of using LDA theme model to train short texts firstly.We can get the corresponding topic distribution.After short text preprocessing and FEGAX feature selection,the corresponding topic words with the highest probabilities are added to the feature sets.Lastly,compared with the PSOSVM algorithm and the SVM algorithm,AdaBoost-PSOSVM algorithm is used to classify the short texts.The experimental results show that AdaBoost-PSOSVM classification performance is better than PSOSVM and SVM in these different short-text datasets.Therefore,we get a conclusion that AdaBoost-PSOSVM algorithm has good classification performance and high generalization ability.
Keywords/Search Tags:Support Vector Machine, Adaptive boosting, Particle Swarm Optimization, Classification effect, LDA thematic model
PDF Full Text Request
Related items