Font Size: a A A

The Hot Topic Discovery And Prediction Methods For Short Text Content

Posted on:2021-04-13Degree:MasterType:Thesis
Country:ChinaCandidate:H M LiFull Text:PDF
GTID:2518306032459264Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the continuous innovation of information technology,the forms of daily communication of Internet users are becoming more and more diversified.As a new type of network media,microblog generates a large amount of text information every day,which contains the description of some hot topics and events.It is an urgent problem for users to understand the hot spots and trends of social events from massive information.Therefore,how to solve the intelligent integration of massive and complex micro blog text information,and quickly and efficiently extract hot topics from the massive information,has a good practical significance,also helps enterprises better grasp the needs of users.Therefore,this paper designs a more efficient hot topic detection and prediction scheme,focusing on the digital representation of microblog short text,clustering of short text,evaluation and prediction of hot topics in microblog short text research.The main work includes the following three aspects:(1)Aiming at the problem that the high dimensional sparseness of microblog text after word segmentation leads to inaccurate digital representation of short text,this paper first proposes an application programming interface(API)based on microblog official platform integrates the customized crawler algorithm to obtain microblog data,and preprocesses the collected microblog dataset,including Chinese word segmentation for each microblog text,filtering invalid stop words,etc.;and then combines the deep convolution generative network(DCGAN)and One-Hot-HMM(OHH),a text feature model of deep convolution generative network(T-DCGAN)is proposed.In this model,word of mutual information(MIW)algorithm is used to calculate the information relevance between words in microblog text.Secondly,the preliminary expression of word matrix is designed VSM(SW-VSM)algorithm to express microblog text,and then through DCGAN to learn microblog word vector matrix,to learn the feature representation of microblog text,T-DCGAN model can improve the accuracy of microblog text digital representation.(2)Aiming at the problem that the traditional Kmeans algorithm is sensitive to the input initial center point,which leads to the problem that the effect of microblog hot topic detection is not obvious.Firstly,this paper designs an item space saving data(IDSS)algorithm to count the frequent items of microblog entries,and then proposes a microblog text clustering algorithm based on microblog distance space saving distance(SSDKmeans).Finally,a hot statistics based on microblog information time span factor(ITFH)algorithm is designed to count the topic heat in the topic set.This scheme can effectively find hot topics in the microblog text content.(3)Aiming at the problem that the hot topic trend prediction of microblog text is not accurate enough,this paper proposes a topic prediction scheme of nonlinear conditional random field based on probability graph(NLCRF)model,including graph clustering method(MMGC)algorithm is used to obtain the set of hot topics in microblog.Then,the prediction algorithm of topic front area(PTFA)and prediction algorithm of topic back area(PTBA)are designed algorithm is used to obtain the hot trend of microblog hot topics,and a set of status sequence of hot topics is obtained.The model can effectively predict the hot trend of microblog hot topics.Finally,through a large number of experiments on the real microblog dataset,it shows that the proposed scheme can more accurately and efficiently discover and predict microblog hot spots than the traditional scheme.Therefore,the scheme designed in this paper can intelligently find hot topics from the rapidly generated massive microblog text information,and accurately predict the hot trend of microblog text topics,so as to more timely and efficiently learn the latest hot topics of social discussion,public opinion trends,and avoid unnecessary risks.
Keywords/Search Tags:Topic Discovery, Text Clustering, Frequent Set, DCGAN, Trend Prediction, CRF
PDF Full Text Request
Related items