Short Text Topic Model With Word Discrimination Learning

Posted on:2019-12-11

Degree:Master

Type:Thesis

Country:China

Candidate:Y N Niu

Full Text:PDF

GTID:2428330545452252

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the advent of the web2.0 era and the widespread appearance of social media,short texts appear in every corner of Internet.Information retrieval,advertising keywords,page titles,anchor texts,online questions,microblogs,and reviews are all short texts.Short texts are updated fast,easy to produce and rich in content,large in scale,but their own information is sparse.Since the number of words is small,there is not enough statistical inference information.It is a great challenging to understand the semantics of short texts.In addition,because short texts usually do not follow the grammar,traditional natural language processing techniques such as part-of-speech tagging and syntax parsing are difficult to apply directly to short text analysis.However,short text comprehension is the basic research related to the development of artificial intelligence,and it is of crucial importance to many practical application scenarios.Text clustering is the basic method of text analysis.The topic model is an effective method for short text clustering,but it faces high dimensionality and sparseness in short text clustering applications.Among them,the lack of word co-occurrence information makes it difficult for the topic model to mine its underlying structure.The study found that:a small number of words in the short text word vector is particularly important for learning the cluster structure,and relatively speaking,the influence of noise words is also more obvious.Therefore,we propose a framework-based short text topic model with word discrimination learning.Binomial distributions are introduced in LDA,BTM,and GSDMM models to learn the discriminative power of words on the cluster structure.Experimental results on multiple benchmark data sets show that the new word discriminant models LDA-?,BTM-? and GSDMM-?,can not only promote the learning of cluster structure,but also accelerate the convergence of the original model.In order to further improve the effectiveness of the topic model in short text clustering applications,we use a small number of samples with supervised information to guide the clustering process.Using the multi-condition learning theory,the LDA,BTM,and GSDMM models are extended to semi-supervised clustering models Semi-LDA,Semi-BTM,and Semi-GSDMM.It can learn the latent structure of supervised information samples and unsupervised information samples.In this paper,experiments are conducted on several benchmark test data sets as well as compared to the semi-supervised topic models Semi-LDA-?,Semi-BTM-?,and Semi-GSDMM-?.Experimental results show that adding supervision information contributes to improving the effectiveness of the topic model in short text clustering.

Keywords/Search Tags:

Short Text, Clustering, Discrimination, Topic Model, Semi-supervised

PDF Full Text Request

Related items

1	Research On Chinese Short Text Classification Based On Semi-Supervised Clustering
2	Research On Semi-supervised Topic Model For Text Classification
3	Semi-supervised Learning On Text Data
4	Forum Topic Model Based On A Combination Of Selective Long Text And Short Text
5	Research On Short Text Classification Of Semi-supervised Pre-training Based On Autoencoders And Word Order Dependencies
6	A Biterm Pseudo Document Topic Model For Short Text
7	Reasearch On The Topic Clustering Of Network Short Text
8	Research On Short Text Topic Discovery Based On BTM Topic Model
9	Research On Short Text Classification Method Based On Semi-Supervised BTM Model
10	Analysis Of Network Public Opinion Data Based On Short Text Clustering