Research And Applications For Text Categorization Based On Topic Model

Posted on:2015-01-30

Degree:Master

Type:Thesis

Country:China

Candidate:S Z Zheng

Full Text:PDF

GTID:2428330491460276

Subject:Detection technology and automation equipment

Abstract/Summary:

PDF Full Text Request

Nowadays,it's a time of information explosion and exponential growth of data.Most of the data is unstructured,which bears a large number of important knowledge awaits us to find out.How to work out a convenient and quick way to process text classification is also a very important issue.The paper mainly contains three aspects as followed:The application of the traditional text classification method is limited.The classification results are slightly worse,while the application of text classification based on topic model has great prospects.The effect is obvious in some areas,such as information retrieval,new social media,sentiment analysis,academic articles,network data and so on.This paper discussed in detail about the topic model in the application of these aspects.This paper implements a semi-supervised LDA(Latent Dirichlet Allocation)model.The method of the implementation is to add a set of words for each topic.The words set is strong correlate with the topic.The model parameters are estimated by Gibbs sampling of MCMC(Markov Chain Monte Carlo).At same time,the probability of the topic on the text distribution are obtained.From the experimental results,the semi-supervised LDA model has more related words to the topic and the offset is smaller,the results are significantly better than unsupervised the LDA model.The LDA model and semi-supervised LDA model are applied to feature selection of text classification.Comparing with the commonly used methods for feature selection of text classification such as mutual information,information gain,document frequency and chi square statistic,the experimental results show that the LDA model and the semi-supervised LDA model used in feature selection are similar to the performance.While the two methods are better than chi square statistic method which has the best performance of several other methods.

Keywords/Search Tags:

topic model, text classification, LDA model, semi-supervised LDA model

PDF Full Text Request

Related items

1	Research And Applications For Text Categorization Based On Topic Model
2	Citation Importance Classification Towards Scholarly Full-text Articles And Its Application In Topic Identification Of Scientific Literature
3	Research And Application Of The Multi-labeled HDP Text Topic Model
4	A Study On Weakly-Supervised Text Classification By Incorporating Neural Topic Model For Supervision Generation
5	Research On Short Text Classification Method Based On Semi-Supervised BTM Model
6	Research On Deep Learning Text Classification Based On Fusion Topic Features
7	Supervised Topic Model
8	Topic Modeling Approaches For Supervised Document Classification
9	Research On Short Text Classification Of Semi-supervised Pre-training Based On Autoencoders And Word Order Dependencies
10	Short Text Topic Model With Word Discrimination Learning