Font Size: a A A

Research On Topic Detection And Tracking Based On Attention Mechanism

Posted on:2020-12-20Degree:MasterType:Thesis
Country:ChinaCandidate:M L NiuFull Text:PDF
GTID:2428330590454833Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of network technology and the full popularization of mobile internet,the traditional news media,such as newspapers,television and magazines,have been unable to satisfy the audience's demand for information.More and more netizens pay attention to the new electronic media dominated by interconnection.As a new platform ofnew electronic media,microblog is favored by more and more users with its unique flexibility and convenience.With the increasing number of users,the amount of microblog data is also increasing.At the same time,due to lack of supervision,microblog and emerging electronic media lead to false information,violence,reactions and terrorism and other sensitive information spread wantonly on the Internet,which has a serious negative impact on the healthy development of society and the long-term stability of the country.Therefore,the necessary topic monitoring and tracking research on micro-blog data can not only effectively grasp the dynamics of network public opinion,but also improve the early warning ability of regulatory authorities or government security departments to real world emergencies.Because of its irregular terminology and random word length,the traditional clustering algorithm,topic monitoring and tracking technology are not effective in microblog public opinion analysis.In this paper,by analyzing the structural characteristics of micro-blog text,on the basis of the traditional algorithm based on text public opinion analysis,combined with word vector space model and in-depth learning method to improve it.The main contents of this paper include the following three aspects:(1)In order to improve the clustering effect of micro-blog text topic clusters,Word2vec word vector model is fused in single-pass incremental clustering algorithm to mine deep text semantic features.Microblog topic cluster clustering firstly preprocesses the text blog,including denoising,Chinese word segmentation and eliminating stop words;secondly,a sentence-based spatial vector model is constructed based on the result of word segmentation;single-pass incremental clustering algorithm based on the fusion of Word2vec sentence vectors can effectively mine the text depth semantic information,which can avoid the high dimension of VSM affecting the speed of computer processing,and at the same time,aiming at transmission.The traditional TF-IDF statistical methods ignore the distribution of feature words between classes and within classes.Experiments show that the single-pass incremental clustering algorithm combining Word2vec sentence vectors can effectively improve the clustering performance of the algorithm.(2)Analyzing the text characteristics of topic clusters,using LDA topic model to discover topic clusters in microblog.Based on the clustering results,the probability distribution of keywords in various topics is calculated,and the keywords with larger probability value are used to represent the topics.(3)A topic tracking method based on focus and text feature fusion is proposed.Spatial vector model(SVM)is used to represent text features,which is prone to sparse data,resulting in poor text categorization effect.In this paper,attention mechanism combined with double-level LSTM pairs is used to represent text features.Then,CNN module is used to enhance feature fusion,which enhances the richness of extracted text features and makes the text features more comprehensive and refined.It improves the topic tracking ability of the model for micro-blog text.
Keywords/Search Tags:text clustering, topic discovery, topic detection and tracking, text classification, Attention Mechanism
PDF Full Text Request
Related items