Font Size: a A A

Research On Short Text Topic Mining Algorithm

Posted on:2019-03-20Degree:MasterType:Thesis
Country:ChinaCandidate:Y Y YanFull Text:PDF
GTID:2438330566473397Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the mobile Internet,the Internet has become an important platform for people to learn and communicate.Tencent We Chat,Sina Weibo and other social platforms are constantly changing people's lives.Meanwhile,the explosive growth of short texts generated on the Internet is far beyond people's imagination.Mining the topic information of massive short texts has important practical significance for tracking hot topics and understanding users' opinions and interests.Traditional topic discovery approaches perform poorly for short texts due to their sparse representation,and so improving the effect of topic discovery for short texts is the current focus.This paper mainly utilizes long documents with rich content to deal with the sparse representation problem of short texts,and studies the effect of topic discovery in the case where the number of topics is known in advance and unknown based on the existing document clustering approaches and topic models.The main research work and achievements include:(1)Aiming at the problem of topic discovery for short texts with known number of topics,a Dual Dirichlet Multinomial Regression(DDMR)model which understand short texts by auxiliary long documents was proposed based on Latent Dirichlet Allocation(LDA)model and Dirichlet Multinomial Regression(DMR)model.A topic set was shared by long documents and short texts which came from different data sources,and two dirichlet priors were used to generate the topic allocation of long documents and short texts,which enabled the topic knowledge of long documents to be transferred to short texts and improved the effect of topic mining for short texts.(2)Aiming at the problem of topic discovery for short texts with unknown number of topics,a Dual Dirichlet Multinomial Allocation with feature selection(DDMAfs)is proposed.DDMAfs model is based on Dirichlet Multinomial Allocation(DMA)model which can automatically identify the number of topics of short texts,and a topic set is shared by discriminative words of long documents and short texts to improving the topic discovery of short texts.(3)A lot of experiments showed that the DDMR model and DDMAfs model proposed in this paper can effectively enhance effect of topic mining for short texts.
Keywords/Search Tags:short texts, topic mining, long documents, DDMR model, DDMAfs model
PDF Full Text Request
Related items