Font Size: a A A

Research And Application On User Relationships And Behavior Features In Social Networks

Posted on:2014-03-13Degree:MasterType:Thesis
Country:ChinaCandidate:Y YuFull Text:PDF
GTID:2180330461472549Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Recently, the Internet business which is mainly represented by short message service, social network service and e-commerce service have obtained a wide use and become popular. Without any doubt, these services bring a variety of benefits to the communications between users in large extent. In this case, the user groups are connected in special approaches by those services to form a social network that the nodes denote the users and the edges represent the relationships. However, due to the large number of user groups and complex user relationships, the social network always has hundreds of millions of nodes and edges, which bring great challenges to analyze the data patterns in social network. In order to solve this problem, in this paper several algorithms are proposed to compute the user relationships and extract the user behavior features. And then based on the short message data provided by a telecom company in a province, the short message social network is constructed and the proposed algorithm is applied to compute the short message user relationships. Furthermore, the behavior features are extracted based on the analysis of user relationships to build up the offline spam short message filter. The feature selections are used to improve the accuracy of filtering. Besides, an optimization solution is designed to solve the bottleneck of efficiency of the offline filter causing by the big data.The user relationships in social network are represented by the shortest distance between nodes, the shorter the distance is the closer the relationship is, otherwise the relationship is sparse. Because that the classical exact shortest distance computing methods are not suitable for large scale social networks, this paper mainly studies the landmark-based shortest distance computing methods and proposes a new landmark selection strategy. Different from existing landmark selection strategy, the landmark selection problem is converted into a combinational optimization problem and a multiple objectives optimization model consisting of a constraint of this problem is defined. Then the confliction between accuracy and efficiency represented respectively by the two optimization objectives is discussed and the effect of the constraint to ensure the accuracy and eliminate the abnormal solution is analyzed. Further, based on a modified multi-objective particle swarm optimization integrating the mutation operator and crossover operator of genetic algorithm, a proper form of optimization objectives without any additional constraints is defined and the equivalence of solutions is proved. In addition, four real networks of large scale are used as datasets to do the experiments. Experiment results show that the estimation error range is very close to zero and the proposed strategy improves both of the accuracy and efficiency compared to other strategies. Finally, the short message users are classified into legit and spam according to the purpose of sending messages, the landmark strategy proposed is applied to compute the relationships between users and the differences in user relationships and some regular patterns are summarized. The conclusion is that the user relationships of legit users are much more closer than spam users.Based on the short message social network, the constructed user behavior patterns are used to make a distinction between legit user and spam user. The differences in user relationships imply that the behaviors of legit and spam users are discriminated, therefore, according to the short message social network, the sending and receiving behavior patterns are analyzed and behavior features with category discrimination are extracted holding the view of user relationships and motivations of behaviors. The filter method and wrapper method are used to select features. The feature selection is regarded as a combinational optimization problem in the wrapper method and the corresponding multiple objectives optimization problem is defined. The multi-objective particle swarm optimization is used to solve this problem combining different classifiers and the optimal feature subset is defined by the experiments of comparing the filter method and wrapper method. And then the offline spam message filter is constructed to detect the short message users in a past week. A linear feature statistics algorithm (LFSA) is proposed to solve the limits of both runtime and space.2.5 billion short messages in about 2 months provided by a partner telecom company in a province are used to carry out the experiments. The results show that the model proposed in this paper is effective and satisfies the requirements of the partner and has been deployed.
Keywords/Search Tags:social network, shortest distance, behavior feature, spam short message filter, multi-objective particle swarm optimization
PDF Full Text Request
Related items