Font Size: a A A

The Study Of Short Text Classification Based On Ada Boost-GASVM Algorithm And LDA Topic Model

Posted on:2016-01-13Degree:MasterType:Thesis
Country:ChinaCandidate:L Q PiFull Text:PDF
GTID:2308330479494266Subject:Computational Mathematics
Abstract/Summary:PDF Full Text Request
With the development of mobile Internet technology, We Chat, Weibo, Group purchase and other products of Internet are merging into people’s daily lives gradually. A growing number of people are enjoying the convenience they have brought, which also results in massive amounts of information simultaneously. Therefore, it’s a major goal to develop fast and accurate text searching methods to find what you are looking for from massive and complex text messages and conduct data mining. These text messages are short, refined, cohesive and sparse. Therefore using the tradition model on its classification lacks some applicability.Support Vector Machine(SVM), which has been widely applied to the field of pattern recognition, is of relatively good classification effect and generalization ability. But there are still many problems to be solved, such as the model, parameter selection problem, as well as the efficiency of large-scale training set.In this paper, we presented a short text classification algorithm based on Ada Boost integrated genetic algorithm automatically selects the parameters of support vector machine(Ada Boost-GASVM) and Latent Dirichlet Allocation(LDA) topic model against the defects of support vector machine algorithms and the characteristics of short texts.Firstly, genetic algorithm is used to automatically optimize the parameters of SVM and the integrated learning technology is introduced to improve the generalization ability of SVM. Get a strong classifier Ada Boost-GASVM integrated by weak classifier produced by using genetic algorithm to automatically select the parameters of support vector classification to enhance learning. The experimental results show the integrated classifier is effective.Secondly, Bi- Gram strategy is used to expand the original features of the short text and LDA is applied to conduct the corresponding topic distribution for the short text. Then, the words about the topic will be added to the origina l features as partial features. At last, the classifier Ada Boost-GASVM is used to classify the short text. The experiment shows that, compared to the traditional vector space model which directly represents the character of short text, the method performs better o n different kinds of short text. Hence, to conduct short text classification, the feature expansion method based on Bi- Gram and LDA topic model is effective.
Keywords/Search Tags:Short Text Classification, LDA Topic Model, AdaBoost Algorithm, Support Vector Machine, Genetic Algorithm
PDF Full Text Request
Related items