Font Size: a A A

Research On Identity Linkage Based On User Behaviors On Social Media

Posted on:2019-10-27Degree:MasterType:Thesis
Country:ChinaCandidate:X Q YuFull Text:PDF
GTID:2428330542496926Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the popularity of social media applications,there is a large amount of user behavior information on the network.Although most users already have a sense of privacy protection and consciously hide identity information,users often ignore the leakage of privacy caused by random and dynamic behavior records.By analyzing semantics in user's behaviors,the attacker can find sensitive information such as the user's interests,hobbies,political inclinations,and use these characteristics to perform user identity linkage attack.There are many application scenarios for identity linkage,such as linking multiple accounts belonging to the same person on different platforms.A social media platform can use multi-platform information to more accurately portray users,thereby serving ads or providing content services to increase users'satisfaction and improve business benefits.This paper investigates the feasibility of using the topic information in user behaviors to conduct identity linkage attack.The purpose is to remind users to pay more attention to their own behaviors and prevent privacy leakage.There exist many challenges to solve this problem.First of all,user behavior is influenced by the combination of platform characteristics,popular topics,and individual emotional states.It is characterized by randomness,dynamics,and fragmentation.User behavior is unlike user name,mailboxe,and other static attributes,which have strong identity and uniqueness.Secondly,forms and objects of user behaviors are platform-dependent.There are great differences in behaviors on different platforms,and the complex semantics contained in massive behavior information lack comparability.In order to solve the above challenges,this paper uses the topics of object content involved in user behavior,which have a consistent form and a high degree of semantic summary.This paper proposes a topic-based implicit vector modeling method,which can measure the intrinsic semantic relevance between topics.Then,according to the interactions between the user and the topics,the intrinsic characteristic of the user is modeled as a vector in the same semantic space.Even if the topics in user behaviors change,as long as the topics contain similar semantic information,the user characteristics obtained by the model are still stable.This paper proposes two innovative goals to learn vector representations of topics,including:(1)Semantic compatibility between co-occurrence topic pairs,which makes topics with similar semantics closer together in implicit space.It helps to extract stable intrinsic characteristics from dynamic user behaviors;(2)Consistency of intrinsic characteristics of the same user,which uses seed users to acquire additional background knowledge of topic semantic,making the semantic representation of topics more complete.When constructing the final objective function,this paper uses the Noise Contrast Estimation(NCE)method,which avoids the huge amount of calculation of standardized items in the objective function and greatly improves the learning speed.For the optimization algorithm,this paper improves Adam and let it be suitable for multi-objective improvement,avoiding the incompatibility of different goals in the optimization,and reducing the number of iterations.The user's intrinsic characteristic vector is modeled based on the probability distribution of the user's behavior on the topics,and the user's identity is linked by calculating the distance between the characteristic vectors.This paper uses the two real datasets from Zhihu and MovieLens to verify the validity of the method.The accuracy is obviously better than that of the related comparison method.In order to better understand the results of learning,this paper analyzes the semantics from the perspective of the topic and the user respectively.From the perspective of topics,it is proved that the learned topic representation are semantically interpretable.From the perspective of the user,it intuitively shows how can this method help identity linkage.Finally,the limitation of the model is discussed by analyzing the failure cases,and the accuracy of the identity linkage is further improved by introducing confidence score for the model output results.
Keywords/Search Tags:Social Media, Privacy Issue, Identity Linkage, Intrinsic Characteristic, Embedding Method
PDF Full Text Request
Related items