Font Size: a A A

Design And Implementation System Of Topic Mining For Union Theme Microblog Based On Comment And Retransmission

Posted on:2017-02-10Degree:MasterType:Thesis
Country:ChinaCandidate:C S ZhaoFull Text:PDF
GTID:2308330503453783Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Microblog is playing gradually more important role as a new form of network media where people can get news, entertainment and other daily network with the rapid development of Internet. Microblog has special characteristics compared with the traditional written articles, such as quite short and brief, supports real-time forward comments, topics spread quickly and highly focused, which set itself as a new region to investigate in. Topic detection technology of microblog is currently the most popular direction which is a way to do researches on big microblog information and classification.The innovations of this paper are as follows:(1) Microblog is brief and short, with a little information and irregular grammar, cause traditional method of topic classification’ effect is not satisfying. The Labeled LDA topic model attach classification label to original LDA model to help cooperative computing the implicit topics, but still exist some vague allocate when handling microblog whose topic frequency are neck and neck. This paper proposes to use the Union Labeled LDA model with comments and retransmissions which enrich the information of labels to enhance the supervision of topic frequency strength by themselves.(2) This paper also discusses how to quick and easy to use microblog open platform API interface implementation for microblog data and their comments and forwarding information. Here we designed a reasonalble recursive algorithm to read and participles of microblog and put them in corresponding database structure for storage. In addition, this paper also considered the point of data pretreatment, the network symbol to replace and the emotional words to expand, that make the output of the topic model more accurately at last.Above all, this paper mainly focus on how to increase the frequency of the central theme label to improve Labeled LDA supervision and traning process by introducing the comments and retransmission and analysising the unique structure of microblog. We also focus the correlation mathematical definition and quantitative study of the subject. In system implementation chapter we discussed data acquisition and preprocession from microblog open platform API, Label tag vector superposition. We get more accurate articles- theme probability distribution and the theme- vocabulary probability distribution by related microblog data. In the end, we will analysis the comparison of the results both from new subject model and several traditional theme models, also with the adjustment of parameters the new model itself.
Keywords/Search Tags:microblog, topic mining, Union Labeled LDA, label, frequency
PDF Full Text Request
Related items