Font Size: a A A

Research On The Identification Of Navy Forces On Weibo

Posted on:2021-01-16Degree:MasterType:Thesis
Country:ChinaCandidate:J C YangFull Text:PDF
GTID:2438330611459045Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
Nowadays people's lives are inseparable from the Internet,and social networks have become an essential part of society.The rapid development of online social networks represented by Weibo in China has led to the birth of Weibo water army as a new online water army in China.Weibo water army uses the characteristics of a large number of users and extensive information on the Weibo platform to spread rumors on the Weibo platform or mislead users by reposting,liking or commenting on certain posted posts,which greatly affects the authenticity of users' information.Sexual judgment.Therefore,effective and real-time identification of these Weibo water army military groups is of great research significance for protecting the interests of users and shaping a harmonious online society.Based on the evolution of Weibo water army in recent years and the challenges of recognition time and the problems that the algorithm needs to solve,the recognition model of parallel weighted random forest algorithm based on Spark platform is designed.In view of the problem that the classification effect of some features is reduced,the features of the Weibo water army forces in the existing research are cleaned and the new features are re-screened to form the final feature set;First,some features of the existing Weibo water army forces are tested to remove the features with reduced classification effects,then new features with good classification effects are re-screened through experience observation and experiments,and finally the feature sets with poor classification effects Integrate with the new feature set to identify the Weibo naval forces;Due to the huge number of Weibo users,massive user data must be continuously read and updated in the process of modeling and recognition calculation,which consumes a lot of time.Therefore,on the basis of using comprehensive features for recognition research,Choose Spark,a distributed parallel computing framework based on in-memory computing,and then parallelize a random forest with excellent classification performance for big data onto the Spark platform.However,the traditional random forest algorithm cannot evaluate the classification ability of the decision tree,so in order to reduce the proportion of the voting results of poorly classified decision trees to the final result,the method of assigning weights to each decision tree is adopted.Using the data outside the bag to calculate the AUC value of the decision tree,additional weights are added according to the merits of performance,and judged and calculated according to the weights,and finally the classification results are obtained.The model experiment results show that re-screening effective features and adding new features for comprehensive recognition,while ensuring the correct rate,the parallelized model of the random forest algorithm reduces model training and recognition time and improves overall efficiency.Experiments have shown that it has a good speedup The parallel efficiency has realized the rapid and accurate identification of mass users,and the F-value of the model classification effect is over 93%.
Keywords/Search Tags:Random forests, Spark, Feature cleaning, Weibo water army user identification, parallelization
PDF Full Text Request
Related items