Font Size: a A A

Weibo-oriented Rumor Identification And Statistical Law Research

Posted on:2021-06-26Degree:MasterType:Thesis
Country:ChinaCandidate:Y M BoFull Text:PDF
GTID:2517306302472594Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
In the era of big data Internet,thousands of news and messages are generated every moment.With the popularization and promotion of mobile terminals such as smartphones,tablets and laptops,the public can access massive amounts of information around the world without leaving home.In this all-inclusive information,a large number of negative news such as rumors will inevitably be mixed,and the spread of rumors will cause social panic and instability.Therefore,the rumors need to be screened in time,and corresponding measures also need to be taken to curb the rumors in the early stage,which is particularly important.Sina Weibo is currently the largest social media platform in China.Users can log in through computers,mobile phones and other terminals,realizing real-time sharing and interaction of information in the form of text,pictures,videos,etc.At present,the number of daily active users is close to 200 million.Because it comes with some social attribute functions,such as reposting and commenting,it facilitates the interaction between users and users,and these functions may accelerate the spread of rumors.In response to the above situation,Sina has adopted the establishment of a "Weibo Community Management Center" to control the spread of Weibo rumors,but this is manually verified by experts and other relevant personnel,which can not eliminate the consequences of the spread of rumors in a timely manner,so the harm caused by the spread of rumors cannot be mitigated to a certain extent.The research of rumors at home and abroad has a long history,and social media platforms such as Twitter and Facebook also need to face the problem of how to prevent the spread of rumors in a timely and effective manner.In the early days,machine learning technology was widely used for rumor detection,mainly through feature engineering from three aspects of rumor text,publishing user's own portrait and spreading features by using machine learning model for recognition,like random forests.However,this idea requires a lot of effort to manually extract and construct features.At the same time,the above-mentioned features of the rumor text are only some shallow features such as network address links,expression packs or emotional words in the text,which cannot fully mine the text deep features such as semantics,and finally the prediction results of these classifiers are not satisfactory.In recent years,deep learning methods have been commonly used for research.By constructing multi-layer neural networks,the pre-trained word vectors of the original text data are used as input,then combine the appropriate activation function to train and update the weights,and finally output the prediction result.Previous research experts and scholars mostly used Recurrent Neural Network(RNN)to implement.It takes a sequence as input and trains in the direction of the sequence.Each sequence node is connected in a chain structure.It can remember the information that has been calculated up to the current node.However,in the learning process of neural networks,the problem of gradient disappearance occurs in recurrent neural networks.It specifically refers to the phenomenon that as the number of hidden layers increases,the classification accuracy decreases.Therefore,scholars have improved RNN and used its variant Long ShortTerm Memory(LSTM)to solve the vanishing gradient problem.It is characterized in that each node is composed of three parts,which are forgetting gates to control the forgotten information of the previous node with a certain probability;the input gate is responsible for processing the sequence of the current node input;at the same time,the state information of the current node is updated with the previously trained weights;and the output gate is used to generate new information and pass it to the next node.Based on the previous research results,this paper has done the following work:1.Construct a dual-input model based on systematic statistical laws.For text data,when the predecessors input the text into the neural network,it was only the original microblog text,and the input of the text when training the model also covered the comment text,because the word frequency after the word segmentation of rumors and non-rumors comments was considered And sentiment analysis,found that there is a certain difference in the distribution of the two.At the same time,for the propagation data,in addition to the difference in the number of likes and collections on the original data set,the author also constructs the characteristics of the propagation speed of the first hour and the first day of the source Weibo to describe.Based on the above exploratory data analysis,a dual-input single-output model with Gated Recurrent Unit(GRU)as the baseline was constructed for detection.2.Introduce self-attention mechanism.By using the self-attention mechanism,the recurrent neural network will no longer be used as the encoder.The core idea is to calculate the correlation between each word in a sentence and all the words in the sentence to adjust the weight of each word.You can get a new expression for each word.It is bidirectionally trained,rather than simply from left to right or from right to left.Compared with RNN,it can capture longer-distance dependencies more efficiently.The BERT(Bidirectional Encoder Representation from Transformers)model is based on this.3.Analyze based on unbalanced epidemic rumors dataset.The world is facing the threat of the new crown virus,and social platforms are also facing the attack of the epidemic rumors.In this paper,we crawled the unbalanced data set related to the epidemic through the crawler tool,combined with the deep learning model for demonstration,and compared the performance of different models on different data sets with the balanced rumor public data set.
Keywords/Search Tags:Rumor Detection, Rumor Spread, Attention Mechanism, Epidemic rumors
PDF Full Text Request
Related items