Font Size: a A A

Research And Applications For Text Categorization Based On Topic Model

Posted on:2015-01-30Degree:MasterType:Thesis
Country:ChinaCandidate:S Z ZhengFull Text:PDF
GTID:2428330491460276Subject:Detection technology and automation equipment
Abstract/Summary:PDF Full Text Request
Nowadays,it's a time of information explosion and exponential growth of data.Most of the data is unstructured,which bears a large number of important knowledge awaits us to find out.How to work out a convenient and quick way to process text classification is also a very important issue.The paper mainly contains three aspects as followed:The application of the traditional text classification method is limited.The classification results are slightly worse,while the application of text classification based on topic model has great prospects.The effect is obvious in some areas,such as information retrieval,new social media,sentiment analysis,academic articles,network data and so on.This paper discussed in detail about the topic model in the application of these aspects.This paper implements a semi-supervised LDA(Latent Dirichlet Allocation)model.The method of the implementation is to add a set of words for each topic.The words set is strong correlate with the topic.The model parameters are estimated by Gibbs sampling of MCMC(Markov Chain Monte Carlo).At same time,the probability of the topic on the text distribution are obtained.From the experimental results,the semi-supervised LDA model has more related words to the topic and the offset is smaller,the results are significantly better than unsupervised the LDA model.The LDA model and semi-supervised LDA model are applied to feature selection of text classification.Comparing with the commonly used methods for feature selection of text classification such as mutual information,information gain,document frequency and chi square statistic,the experimental results show that the LDA model and the semi-supervised LDA model used in feature selection are similar to the performance.While the two methods are better than chi square statistic method which has the best performance of several other methods.
Keywords/Search Tags:topic model, text classification, LDA model, semi-supervised LDA model
PDF Full Text Request
Related items