Research And Application Of The Multi-labeled HDP Text Topic Model

Posted on:2018-06-18

Degree:Master

Type:Thesis

Country:China

Candidate:T Zheng

Full Text:PDF

GTID:2428330542990550

Subject:Information management and information systems

Abstract/Summary:

PDF Full Text Request

With the advent of the era of big data,the Internet resources lead by the text resources present its exponential growth.From such a huge and complex forms of text data,how to through the clear methods and ways to dig out the potential resources,value of the data and customer interested content has got the focus of academic and industrial application research.In order to solve this problem,many scholars deal with text data from association rule mining,text classification,feature selection,text clustering,topic mining and so on.This paper which is based on the textstill uses the modeling method of text hidden theme to construct the topic model,and finds the semantic theme between text to express the core meaning of the text.It aims to provide an improved and useful algorithm for text clustering and classification.Hierarchical Dirichlet Process(HDP)topic model based on Bayesian theory andDirichlet process can be automatically learned the optimal structure of theme set from the data.However,in practice,the best topic-set which dimensionally reduced from the text-set structure is not in accordance with the requirement of the semantic.And some of the existing theme models with labels also need to set the parameters which is very difficult to define.Therefore,a Semi-supervised labeled HDP topic model(SLHDP)and the accuracy evaluation index of random cluster(sk-measure)based on the part of known semantic labels is proposed in this paper.From the parameter definition,variable mapping,model architecture and the derivation of basic formula,the theoretical framework of SLHDP model is constructed gradually.At the same time,the graph model is associated with the physical process of text hidden topic clustering,generating the topic set in Gibbs sampling with Chinese restaurant(CRF)and Stick-breaking model,explaining the model in detail from two aspects in structure and semantics.We apply the SLHDP model to the Chinese and English data sets.A case study of English news data set,the cross validation method is used to discuss the parameters of the model.The model combines with the optimal parameters that obtained from experiments,applying to the test set.At the same time,comparing the three indexes with SLLDA and HDP model,thenthe experimental results show that the SLHDP model can make the composition of topic set more reasonable in the text classification of large scale data sets.In this paper,we also do some work and think about the application of SLHDP model.Extend theapplication scope of SLHDP model that solves some practical problems in other fields.In the end of this text,we summarizes the shortcomings and deficiencies of this thesis,and makes a prospects for the future research work.

Keywords/Search Tags:

Label, Semi-Supervised, HDP, Topic Model, Random Cluster

PDF Full Text Request

Related items

1	Topic Modeling Approaches For Supervised Document Classification
2	Research And Application Of Image Classification Algorithm Based On Semi-supervised Learning
3	Semi-supervised Clustering Algorithm Based On Label Propagation
4	Research On Semi-supervised Topic Model For Text Classification
5	AutoLink Semi-supervised Multi-label Study Of Literature Research And Implementation Methods
6	Research On Weakly Supervised Learning Based On Controlled Random Walk Model
7	Distributed Semi-Supervised Learning
8	The Study Of Robust Semi-Supervised Classification Algorithm Based On Label Prediction And Propagation
9	Research On The Application Of Semi-supervised Learning In Natural Language Processing
10	A Semi-supervised Based Method For Entity Set Expansion