Nowadays,it's a time of information explosion and exponential growth of data.Most of the data is unstructured,which bears a large number of important knowledge awaits us to find out.How to work out a convenient and quick way to process text classification is also a very important issue.The paper mainly contains three aspects as followed:The application of the traditional text classification method is limited.The classification results are slightly worse,while the application of text classification based on topic model has great prospects.The effect is obvious in some areas,such as information retrieval,new social media,sentiment analysis,academic articles,network data and so on.This paper discussed in detail about the topic model in the application of these aspects.This paper implements a semi-supervised LDA(Latent Dirichlet Allocation)model.The method of the implementation is to add a set of words for each topic.The words set is strong correlate with the topic.The model parameters are estimated by Gibbs sampling of MCMC(Markov Chain Monte Carlo).At same time,the probability of the topic on the text distribution are obtained.From the experimental results,the semi-supervised LDA model has more related words to the topic and the offset is smaller,the results are significantly better than unsupervised the LDA model.The LDA model and semi-supervised LDA model are applied to feature selection of text classification.Comparing with the commonly used methods for feature selection of text classification such as mutual information,information gain,document frequency and chi square statistic,the experimental results show that the LDA model and the semi-supervised LDA model used in feature selection are similar to the performance.While the two methods are better than chi square statistic method which has the best performance of several other methods. |