Research On Anonymous User Identification

Posted on:2020-03-20

Degree:Master

Type:Thesis

Country:China

Candidate:Q H Yuan

Full Text:PDF

GTID:2428330590460637

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

User Identification is essentially a De-Anonymization problem.In the realistic identification task,its purpose is generally to find the most similar user from non-anonymous groups based on the behavior data of anonymous users.The user's behavioral data refers to the traces of operations left by users in various network and communication services,often containing their own behavioral patterns,representing their preferences and habits in service consumption.According to the user's behavior data,we can perform their behavior patterns,and achieve the purpose of identifying the identity of anonymous users through a match between behavior patterns.In this paper we focus on general identification issues and explore common methods for various identification scenarios.Firstly,the identification method based on feature distribution histogram is studied,and then the correlation of behavior features in time dimension is introduced.Based on this,a identification method based on feature sequence is proposed.In this method,firstly,all the feature sequences of the user behavior features on the timeline are obtained by using the n-gram model;then the set of feature sequences according to the heat order is constructed according to the TF(Term Frequency)value of the sequence as a representation of the user behavior pattern;Finally,we propose a matching method of ordered sets,matching the anonymous user with the known user's feature sequence set,and selecting the user with the highest matching similarity as the identification result.In this paper,the above methods are experimentally verified in three different realistic scenarios,and some common problems in the identification task are discussed.Firstly,experiments show that in the three scenarios of this paper,the accuracy based on the feature sequence method is always not lower than the classical feature histogram method.In the user shopping and web browsing scenarios,the accuracy is increased by 10% and 7% respectively,and time is reduced.Secondly,this paper focuses on the problem of less anonymous user data often encountered in realistic identification tasks.In this problem,the method based on feature sequence is better.Finally,the feature sequence-based method can be used to distinguish users with distinct features.Experiments show that the accuracy can reach 98% in the user shopping and TV viewing data sets.Therefore,we can have a high degree of trust in their identification results,which is of great significance in some practical applications.

Keywords/Search Tags:

User Identification, De-anonymization, Feature Histogram, Time Series, n-gram

PDF Full Text Request

Related items

1	Design Of De-anonymization And Identification Algorithms In Social Networks
2	Research On Real-time Identification Method Of Data Stream Time Series Events
3	Identification Method Of Acoustic Emission Signal Based On Time Series Feature
4	Research On Feature Representation And Classification Methods In Time Series Data Mining
5	Person Re-identification Algorithm Combining Spatio-temporal Apparent Feature Fusion With Feature Matching
6	Research On Language Identification Of Social Media Short Text Based On N-Gram Vector Feature
7	Research And Implementation Of User Behavior Time Series Clustering
8	Research On Feature Representation And Clustering Method For Time Series
9	Research On Cloud Anonymization Technology Of User Data In Voice Assistant
10	Perceptually Important Points-based Futures Time Series Pattern Recognition And Its Application Research