Font Size: a A A

Machine Learning-based Water Army Identification And Topic Influence Analysis Research Study

Posted on:2019-01-29Degree:MasterType:Thesis
Country:ChinaCandidate:Y X GaoFull Text:PDF
GTID:2428330593450577Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development and popularization of the Internet,Sina Weibo has become the largest comprehensive social platform in the society.As of June 2017,Sina Weibo's monthly active users reached 361 million,an increase of 28% over the same period of last year.However,its rapid development has also been accompanied by a series of problems,including the rise of the water army's Weibo platform and its rapid growth.The existence of the Weibo Water Army has greatly affected the quality of Weibo topics,causing many untruthful topics to interfere with the thinking of netizens and the judgment of topic trends,thereby causing serious problems such as unhealthy social environment.This paper studies the differences between the attributes of water army users and normal users,establishes a feature recognition model based on the improved logistic regression algorithm for water army users,and then screens out the water army content of Weibo users and their published Weibo information,and then analyze the influence of Weibo on the topic level to find out the Weibo opinion leaders of the current topic.Through the topic detection and analysis of the Weibo influence of the topic level to find out the opinion leaders in the Weibo to understand the latest and hottest information of Weibo and its public opinion direction.In this paper,we use the Tensorflow learning framework to train the identification model of the Weibo water army based on the combination of the user characteristics,behavioral attributes,and time characteristics of the navy and the improved logistic regression algorithm.By comparing the experimental results,we find that the improved method effectively identifies the water army;At the same time,the topic of the Weibo content of the normal users left by the removal of the water army is detected by combining the LDA topic probability model and the improved Single-pass incremental clustering algorithm.Due to the poor clustering efficiency of the original Single-pass algorithm,the dependence on the input order of the text,and the specificity of the Weibo cannot cluster to get good topic results,the following improvements have been made to the Single-pass algorithm: 1)Increase The time parameter determines whether the topic satisfies the identity;2)Calculates the clustering center point to reduce the multiple similarity calculation between texts to improve the clustering efficiency;3)Inputs the Weibo text data in batches to reduce the influence of the original algorithm over-relying on the input order to the topic results.The original algorithm and the improved algorithm are tested separately.Finally,the experimental results show that the improved Single-pass algorithm improves the efficiency and accuracy of topic retrieval.This paper uses the PageRank algorithm to propose a Weibo influence evaluation method at the topic level,and analyzes the detected topics to find the Weibo opinion leaders at the topic level.The Weibo influence of the topic level is related to the following three factors: 1)the activity level of the user;2)the degree of attention of the Weibo related to the topic;3)the quality of the user who forwards the topic Weibo.Finally,through the analysis of specific examples to illustrate the effectiveness and feasibility of the method.
Keywords/Search Tags:Logistic regression algorithm, Improved Single-pass incremental clustering algorithm, Impact Analysis, Water Army Identification, Topic Detection
PDF Full Text Request
Related items