Font Size: a A A

Research On Microblog Topic Detection Method Combining Word2vec And Single-Pass

Posted on:2020-04-08Degree:MasterType:Thesis
Country:ChinaCandidate:Q ZhouFull Text:PDF
GTID:2438330575959328Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In recently years,micro-blog has been increasingly popular with the public for its grassroots,convenience and rapid dissemination of current events,and has become the mainstream media for the Chinese public to understand current events and participate in hot topic discussions.Micro-blog is a broadcast social media and network platform that shares short-text real-time information through attention mechanisms.Based on this platform,information sharing,dissemination and acquisition can be carried out through user relationships.The mutual communication and dissemination between users has produced a variety of topics.The immediacy of micro-blog has greatly promoted the development of the topic and formed and spread popular topics.In some micro-blog topics that have caused discussion among netizens,the number of users participating in reading and interacting has reached tens of lillions.These topics usually contain important information and have strong social influence,which has caused many experts and scholars.attention.Therefore,how to quickly dig out hot topics from a large number of micro-blog texts is of great significance.Based on this,the work researched in this paper focuses on improving the accuracy of micro-blog topic detection based on the following three aspects:(1)A text tree representation method based on Word2vec and sentence structure is proposed to improve the accuracy of calculating text similarity.Firstly,the feature word is extracted in the text,and the other words after the feature word and the text segmentation are calculated according to the Pearson correlation coefficient to create a content tree.Secondly,the content tree is used to correlate the words according to the adjacent words.Construct a word vector that depends on the sentence structure;then average the obtained word vectors to obtain a vector representation of the sentence;finally,the method is verified by Chinese text classification and text similarity calculation.(2)An single-pass hierarchical clustering algorithm based on micro-blog content is proposed to improve the accuracy of micro-blogging topic detection.Firstly,the text with less content of the crawled micro-blog text is filtered,and the text with more content and richer topic is retained,and the cluster is preferentially clustered by using the single-pass clustering algorithm to form the topic center;then the content is less.The micro-blog text input is clustered with the micro-blog text that has formed the topic center,which finally improves the accuracy of the micro-blog topic detection.(3)Design and implement a micro-blog topic detection system,and apply improved text similarity and topic clustering algorithms to the system.Firstly,the above two algorithms are used as the theoretical basis of the micro-blog topic detection system,and the corresponding functional modules of the system are analyzed and designed,and then the topic detection system based on micro-blog is realized.The system can accurately analyze popular micro-blog topics in a certain period of time.
Keywords/Search Tags:topic detection, single-pass, word2vec, content tree
PDF Full Text Request
Related items