Font Size: a A A

Research On The Optimization Of Feature Decision Clustering Algorithm For Detecting Spam Users In The System

Posted on:2022-02-01Degree:MasterType:Thesis
Country:ChinaCandidate:T XiaoFull Text:PDF
GTID:2518306524451794Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
Online social networks(OSN,Online Social Net)have a huge user base,attracting users from different industries and age groups.Although most OSNs are mainly used for various benign purposes,their own openness,large user base and real-time message proliferation make them profitable targets for cybercriminals.OSN has proven to be a new incubator with sophisticated attacks and threats,such as cyber bullying,spreading rumors,cyber fraud and other illegal activities.Sina Weibo is considered a very popular online social network,and it has become an important information dissemination and communication platform in people's social life.Massive microblog data contains a large amount of valuable information,but in recent years,a large number of spam users have appeared on the microblog platform,disseminating various types of spam information through various channels.This not only affected the impact of Weibo data mining and decision analysis,but also seriously affected the healthy development of the Weibo platform and user experience.In addition,with the continuous update of Weibo functions,user feature dimensions have become more complex,making it difficult to extract effective features,resulting in unsatisfactory classification accuracy and high complexity;at the same time,there is a large amount of redundancy and complexity in multi-dimensional user data.Irrelevant features directly affect the subsequent classification performance,and even increase the complexity.Aiming at the two aspects of feature selection and classification of online social network user data,this paper proposes a feature decision clustering algorithm.The algorithm is mainly divided into two parts,the preprocessing stage: the blacklist associated clustering algorithm is used as the preprocessing of the algorithm,and a new clustering framework is used to identify user clusters performing malicious tasks from the data set of network interaction.First use the defined similarity measure to construct the ID-ID undirected graph,then use the measurement statistical correlation to measure the correlation between the blacklist and the user ID,and use the given blacklist to find the best threshold to delete the weakly correlated edges,And then judge whether the standardized residual of the ID cluster is greater than 3,and finally get the malicious cluster highly correlated with the blacklist.A large number of spam accounts can be quickly eliminated through preprocessing,and then the user characteristics can be used to further cluster users.Feature decision clustering stage:First,use the fuzzy C-means objective function with feature weighted entropy to construct a learning mode for the parameters,calculate the weight of each feature through multiple iterations,remove irrelevant or redundant feature components,and compare the features Make a decision and select,iteratively update the membership function,cluster center and feature weights until the optimization,and finally identify the garbage user cluster with high precision.In order to verify the effectiveness of the algorithm,the marked Weibo user data set is selected as the simulation data set on the Python platform.The first is the preprocessing simulation.This part includes finding the optimal threshold and preprocessing analysis;secondly,the algorithm objective function convergence and Performance index analysis;Then there is a comparative analysis of algorithm clustering performance.This part divides the data set into 3 different data positive and negative sample ratios,compares the proposed algorithm with the SDAFS algorithm,the ELAFC algorithm and the NADMB algorithm,and compares the different algorithms in different positive and negative.The classification situation in the case of sample data sets;the influence of four performance indicators on the classification effect.Followed by feature selection analysis,simulation analysis of the distribution of feature weights selected by the FDCA algorithm,and through statistical significance analysis to verify whether the differences between the features retained by the algorithm among users are random,and verify the effectiveness of the feature selection of the algorithm;It is the analysis of the influence of the number of features on the classification effect,and explores the influence of the number of features on the performance of different algorithms.The simulation results show that the FDCA algorithm has improved on the four main performance indicators,and proves that the feature selection embedded in the algorithm effectively reduces the time complexity while ensuring high classification accuracy.At the same time,it can maintain good classification performance under the condition of a large number of redundant features,and has good robustness.
Keywords/Search Tags:Feature selection, Spammer detection, Online social network, Fuzzy clustering
PDF Full Text Request
Related items