| Online social networks have become an important platform for people to communicate.As the largest Chinese community platform at present,sina weibo provides people with a more convenient and efficient way of communication,but there are also a large number of zombie accounts and other security risks.Zombie account service providers produce zombie accounts in large quantities by automatically registering through computer programs or stealing other people’s accounts,and cultivate them through a set program.They do not participate in normal social activities,but pay a lot of attention to seek mutual followers,and spread spam or naval comments in this way,and even affect public opinion.These zombie accounts not only seriously endanger the normal social interaction of users,but also affect the overall atmosphere of the social platform and the direction of online public opinion.Therefore,the problem of zombie account detection for social platforms has become an urgent problem to be solved.With the continuous improvement of anti-spam technology on social platforms,zombie accounts constantly adjust their behavior patterns to avoid detection based on various censorship and detection mechanisms on social platforms.Therefore,the existing zombie account detection technology has the following problems: Firstly,the current research on zombie accounts does not target different categories of zombie accounts,unable to adapt to the diversity of zombie accounts;Secondly,the antidetection methods of zombie accounts continue to improve,gradually approaching the normal account in terms of personal information,behavior,etc.,so the existing zombie account detection algorithm is less efficient,and the difficult to identify zombie accounts rely on the normal account’s whistle-blowers to distinguish;Thirdly,existing detection algorithms are mostly based on high-dimensional features to analyze and detect zombie accounts,which have certain limitations in detection accuracy and detection efficiency.Based on the existing different categories of zombie accounts,this thesis analyzed the differences and similarities among various types of zombie accounts and between zombie accounts and normal accounts,and studied the feature selection and detection algorithm.The main research contents are as follows:Firstly,in order to obtain zombie accounts and normal accounts,data acquisition methods based on honeypot pages and Weibo crawlers are proposed.The data of zombie accounts and normal accounts on the platform are obtained through the honey pot page registered on the sina weibo platform and the popular microblog channels,and then the collected account data are crawled again through the crawler program to collect the basic information and the microblog data,namely the original data sample.Secondly,for the anti-detection strategy of zombie accounts,the abnormal features that can distinguish zombie accounts from normal accounts are extracted on the basis of the existing features to improve the detection efficiency.In order to avoid the influence of excessive feature dimension on detection performance,a feature selection algorithm based on PCA algorithm and Relief F algorithm is designed according to the complementary characteristics of feature extraction algorithm and feature selection algorithm,namely PCA-RF algorithm.Firstly,PCA is used to process feature extraction from the original feature set of the microblog account data sample,and the redundant features in the feature set are removed to realize the mapping of high-dimensional data to low-dimensional subspace.Then,Relief F algorithm is used to sample the extracted feature set for several times,calculate and update the weight of each feature,and give the corresponding weight to each feature in the sample.Finally,according to the weight of the feature,the features with higher discrimination are screened out and the final optimal special collection is obtained.Through the comparative experiment of Naive Bayes algorithm,Support Vector Machine algorithm and Random forest algorithm,the validity of the proposed abnormal features is verified,and it is found that the Random forest algorithm has a good classification performance in the zombie account recognition.Finally,based on the above algorithms,a prototype system of zombie account detection based on PCA-RF is designed.The prototype system can display the information related to the zombie account,and automatically crawl the account information and identify the account category according to the user’s needs.It mainly includes data acquisition and preprocessing module,feature extraction module,zombie account detection module and Web module. |