Font Size: a A A

Research On The Detecting Of Spammers In The Microblog Network

Posted on:2016-06-16Degree:MasterType:Thesis
Country:ChinaCandidate:X T ChengFull Text:PDF
GTID:2308330482479064Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
With the arrival of Internet age, social media represented by micro-blog has become an important platform for the dissemination of hot spot information. At the same time, a large number of active online spammers driven by interests spread network rumors and false information, which seriously disturbs the normal network order and has negative effect on social harmony and stability. At present, the research on the spam user identification in micro-blog network indicates the following problems:(1) how to find the online spammers during the information communication; (2) the escalating of spammers hidden strategy makes it difficult for the traditional identification methods which are based on content and action to cope with the new-type spammers; (3) the large and high-dimensional growth of micro-blog user data makes the existing spammers identification method cannot meet the demand of effective spammers detection, and there is a problem of numerical lack in the user characteristics.In view of the above problems, this article firstly analyses the facts affecting the network information dissemination from the micro-blog network structure and the traditional information dissemination model, establishes a micro-blog network information dissemination model based on local information, and confirms the existence of the online spammers by analyzing the differences between the normal information dissemination and the abnormal information dissemination agitated by the online spammers. Secondly, through the analysis of the users’ network, proposes a micro-blog spammers diagram-featured identification method which can identify the micro-blog spammers more effectively. At last, in order to solve problems of the massive high-dimensional micro-blog user data against limited identification time and numerical lack, this article puts forward a kind of micro-blog identification method based on MapReduce random forest to improve the recognition efficiency. The main work and achievements are as follows:1. Putting forward a micro-blog network information dissemination model based on local information. This kind of model can give a fine-grained description of the information dissemination process; observe the effect of micro-blog users in micro-blog to the information dissemination easily; find the abnormal information dissemination mixed with online spammers disturbs, and so as to lay the foundation for the online spammers’ identification.2. In view of the failure of the traditional method based on content and action to fit the new type online spammers, this article proposes a micro-blog spammers diagram-featured identification method which builds user-oriented diagram, quantifies the feature extraction and gives classified forecasting on the users from users characteristics, time characteristics and machine learning. The simulation results show that the accuracy and recall of online spammers’ identification has increased by 5% after adding diagram feature, which verifies the effectiveness of the diagram in the identification of online spammers.3. To solve the problem of data missing and the enormous data volume, this article put forward a kind of micro-blog identification method based on MapReduce random forest. By using the randomness of the random forest method, it solved the problem of over-fitting and numeral lack. And through the parallelization of algorithm, it improved the speed of online spammers’ data handling and meets the challenge of the massive online spammers against the limited time. The simulation results show that this algorithm achieved a speed-up in linearity and reduced the identification time effectively. Compared with RBF neural network and Bayes network algorithm, it is more robust in data missing and the accuracy of spammers’ identification can increase 10%.
Keywords/Search Tags:microblog networks, spammer, information propagation model, relationship graph, machine learning, MapReduce, random forest
PDF Full Text Request
Related items