Font Size: a A A

Characteristic Analysis And Classification Detection Of Online Social Network Users

Posted on:2017-01-25Degree:MasterType:Thesis
Country:ChinaCandidate:K Z FengFull Text:PDF
GTID:2308330482480516Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In recent years, with the continuous development of Internet technology, the online social networking sites have gradually become one of indispensable ways of communication in people’s life. Unlike the real social life, the spread of information in online social networks is faster, covering more people and interacting more frequently.Microblog has been a kind of widely used information communication carrier, and an important way of interaction. As time goes on, microblog platform has produced different varieties of users, including zombie accounts, vest accounts, trash accounts, etc. They are for different purposes, disrupting the order of microblog. The number of fans, as an important standard of measuring microblog users’reputation and popularity, attracts more and more attention, especially its authenticity. Driven by economic interests, speculators have derived a lot of machine accounts to trigger a crisis of confidence. As the evolution of the zombie accounts, machine accounts have many similarities with zombie accounts. They all come from computer programs, and their purposes of existence are fans business. The differences between them are that the evolved zombie accounts are more active and their behaviors are more close to the real accounts’. These characteristics lead to that Sina microblog can’t detect this kind of accounts and forbid the accounts. How to identify the machine accounts quickly and efficiently has become an urgent problem to be solved in maintaining the order of microblog platform.To explore this question, this study adopts the method of machine learning to carry out further research for the characteristics of Chinese microblog users, and taking Sina microblog as an example. In addition, the current research conducts a detailed study and analysis of the relevant classification algorithms and puts forward an optimal classification model. The specific work is as follows:1. Data collection. This part introduces the steps of data acquisition. Firstly, the two parts are machine user set and non-machine user set which respectively based on popular topics and "buy powder". Then the invoking principles of Sina API are introduced in detail. Finally, this article uses the combined method of data extraction solution, and gets the basic information of the users and massages which constitute the original data set of this research.2. Characteristic analysis. Firstly, the several characteristics of Sina microblog are combined to obtain the thirteen original features. Then this study analyzes the features of relationship between users, thus, the users’behaviors and the messages to process the original features characteristics getting nine effective characteristics. By turning them into a vector, the experiment obtains the input of classifiers. Finally, this part uses the cumulative distribution function figure to analyze the features.3. Optimal classification model. Firstly, apply the SVM algorithm to verify the effectiveness of the characteristics. BP neural network and decision tree are respectively used to classify the users, which were frequently applied in users’ classification. The results are compared with SVM, in order to get better classification effect. In the case of the dissatisfied result, it introduces the random forest algorithm to improve the classification effect. Then, in order to further improve the efficiency of classification, BSB method is adopted to get the optimal combination characteristic. This paper proposes the optimal classification model of this study, which named SBS-Random Forest.4. Examples of verification. Proof procedure randomly selects a user and obtains its fans’information with the combined method of data extraction solution, and generate the fans feature set. Then the method uses the SBS-Random Forest to classify these users. Comparing the data which comes from artificial vote, the result finds the classification quality in the ballpark, and gets the user’s machine fans proportion in the all fans. Finally comparing with other methods, the classification model proposed by this paper is effective and feasible, and this study has positive significance in identifying the machine accounts.
Keywords/Search Tags:Social networks, Microblog, Machine user, Data mining, User behavior analysis, Machine learning
PDF Full Text Request
Related items