Font Size: a A A

Research On Micro-blog Rumors Recognition Based On Sensitive Thesaurus

Posted on:2019-04-04Degree:MasterType:Thesis
Country:ChinaCandidate:R R LinFull Text:PDF
GTID:2428330545952985Subject:Management information system
Abstract/Summary:PDF Full Text Request
Many experiments have been done by scholars on the rumors recognition of micro-blogs or Twitters.The traditional method of rumors detection mostly uses the construction of feature engineering to carry out a supervised learning process of classifications.The selection of features is often statistical features,and single classifier's the space for lifting is also limited after repeated parameter adjustment test.Based on these situations,this paper introduces two improved aspects--features extraction and classification algorithms.For features extraction,this article mainly makes the following innovations:The first is to create sensitive thesaurus and introduce the features based on it.In order to find traces of rumors from the statistical characteristics of micro-blogs(the number of micro-blogs words,the number of fans),more and more casual and new words,unregistered words appear frequently and rumours are timeliness,the article builts a real-time sensitive thesaurus based on the content of the rumors,which can improve the accuracy of the rumors recognition.This paper is to construct the micro-blogs sensitive thesaurus,which is constracted by Hot Sensitivity algorithm----L-CPBL algorithm and extension of multiple words.Firstly,extract the seed word set.L-CPBL algorithm a quick word extraction algorithm without dictionary reference.At the same time,the improved weight of LTC and location makes it more accurate to extract seed word set.Then,based on feature space vector optimization model and clustering algorithm,to get multiple expansion set.Finally,combining the seed word set with multiple expansion set to get the sensitive thesaurus.And obtain features based on sensitive thesaurus.The second step is to introduce text semantic features.Based on recent research,the micro-blog's features are mainly concentrated on the statistical characteristics.What's more,it ignores the semantic information of words and text.Using deep learning to extract features of text content.The deep features include text,semantics,grammar and contexts which can best contain contextual information of rumors.At the same time,the text semantic features are fused with the statistical features and sensitive thesaurus features.At the same time,those features can effectively improve classification accuracy.For classification algorithm,this article mainly makes the following innovations:First,the GBRT integrated classification algorithm is introduced,which combines the concept of GB(Gradient Boosting)with the weak classifier.The optimization goal of next decision tree is based on difference between the prediction results of the last decision tree and the real value.And the final result of the model is added to the results of each decision tree.GBRTs own characteristics determine that it can handle mixed types of data,and effectively carry out feature discovery and combination.The two is to introduce the Long and Short Term Memory(LSTM)as the second recognition of rumors.Experiments show that after the integration of the deep features and the shallow statistical features and GBRT classification,We found that semantic similar rumours is clearly misclassified.Though semantically similar,such rumours differ greatly in terms of vocabulary and statistical characteristics.Therefore,GBRT has some obstacles to recognize this kind of rumors.Therefore,Long and Short Term Memory(LSTM)can be regarded as the second recognition of rumors.Because it has a certain memory function and its three special "doors" can be a good solution to the long-term dependence problem.Experiments results show that,LSTM,which using micro-blogs text directly as input,can effectively improve the rate of rumors recognition.
Keywords/Search Tags:Rumors Sensitive Thesaurus, Integrated Classification Algorithm, Feature Space, The Long And Short Term Memory(LSTM)
PDF Full Text Request
Related items