Font Size: a A A

Research On Key Technology Of Linking User Identities In Internet

Posted on:2016-10-15Degree:DoctorType:Dissertation
Country:ChinaCandidate:D LiuFull Text:PDF
GTID:1318330536467110Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet as well as its network applications,a user may register on multiple different applications and provide different identity information.Linking multiple user identities which are controlled by the same owner across multiple sites in Internet has important practical significance.From the network security aspect,illegal users may use anonymous accounts to do harmful behaviors.Linking multiple user identities can help to discover such accounts and consequently reduce security risks caused by the account anonymity problem in Internet.Moreover,this technique can also be used for user feature analysis,decision supporting and other tasks in the commercial field.Against the background of linking multiple identities belonging to the same user that may register different network accounts and provide different identity information,several key technologies are analyzed in this dissertation.In summary,the major contents and contributions of the dissertation are as follows:1.Considering that Internet applications usually require their users to provide user profile information and the profile information of identical user registering in different applicaions may contain different categories,we propose a methodology of linking multiple user identities based on the comparison of user profile information.According to one-on-one matching of multiple attributes of personal registration information,a schema dependent method of user identity matching is proposed and analyzed at first.However,this method always need to prune non-common attributes.Consequently,the user profile information is wasted and the precision of linking multiple user identities may decreases.Therefore,we chiefly analyze and propose a bipartite graph matching method of registration information.First,ignoring the semantic meaning of each attribute,complete weighted bipartite graph is built according to two users' registration information.The weight of each edge in the graph is set as the similarity between two attributes.Then,considering the similarities among all the conresponding attributes of two accounts,the extended maximum weight matching result of the bipartite graph is obtained.Finally,the similarity between two user accunts is obtained which is calculated as the sum of weights of edges in matching result.The similarity also represents the possibility of two accounts' belonging to the same owner.A series of experiments based on Sina and Tencent microblog datasets validate the effectiveness of our proposed method on linking account registration information.2.In order to solve the problem that user profile information of Internet accounts may be incomplete or partially false,we propose a linking and detecting methodology on user multiple identities across websites relying on accountname features analysis.After a detailed analysis on the accountname naming features(including length,usage ofspecialcharacters,usageofnumber,typingpatternofcharacters,combinationpatternofcharactersaswellastheuniquenessofenglishstring)basedonthemassiveaccountname-emailnamepairsdataset.furthermore,consideringeachaccountnameisassociatedwithanemailname,andtheemailname(prefixofemail)isalsoanaccountnameofthesameuser,wemakesamestatisticanalysisofthechangingpatternofaccountnameaswellasconrespondingemailnameontheaboveaccountnamefeaturedimensions.thenthedistributionsofaccountnamechangingfeaturesareobtained.whenanalyzingnamingfeatuers,oncethepossibilityofcertainfeatureislow(thenamingfeatureisunique),whentwoaccountnamesbothfitsuchfeature,itmeansthatthecentaintyfactoronsuchnamingfeatureishigh.meanwhile,whenanalyzingchangingfeatures,oncethepossibilityofcertainfeatureishigh,itmeansthatmanyuserschangetheiraccountnamesusingsuchchangingpatternandconsequentlythecentaintyfactoronsuchchangingfeatureishigh.finally,adoptingtheoryofcentaintyfactor,thecomprehensivecentaintyfactorofaccountnamenamingfeaturesandchangingfeaturesareobtained.aftercomparingtheresultwithpredefinedthreshold,whetherdifferentaccountnamesbelongtothesameusercanbeidentified.whengivenasingleusername,thetechniquecanalsobeusedtofinditsowner'sotherpotentialaccountnamesincandidateaccountnameset.finally,weevaluatetheeffectivenessofourmethodthroughaseriesofempiricalstudiesbasedonalargescaledatasetincludingnearly48millionusername-emailnamepairs.3.therearemanysockpuppetaccountswhicharecontrolledbythesameownerinonlinesocialnetworks.inordertodetectsuchsockpuppetaccountsandtheirgangstructures,weproposeamethodologyofdetectingsockpuppetgangsbasedontheanalysisofsentimentorientationofeachaccountaswellasmultiplerandomwalksonsimilarorientationnetwork.first,weextractdifferentkindsofemotionalphrase(positive,negativeandneutral)fromcommenttextsofeachaccount,andthenanalyzethesentimentorientationofaccountstotopicseries.welinktwouseraccountsiftheyhavesimilarorientationstomosttopicsandthenbuildasimilarorientationnetwork(son)consideringthetimefactorofcomment.thirdly,inson,wechoosethevertexwhichhavethehighestdegreeasinitialvertex,andperformrandomwalkfollowtheprobabilityofeveryconnectingedge'sweight.aftereachstep,theweightofrelevantedgeinsonisthenremeasurediteratively.consequently,theweightbetweensockpuppetaccountswhicharecontrollerbythesameownerisenhancedremarkably,andthegangstructureofsonismuchcleareraftermultiplerandomwalks.then,weanalysethehierarchicalandoverlappingfeaturesofsockpuppetgangstructuresinsonbyuseofmultiplecommunitydetectionalgorithms(gn,bgllandcopra),andfinallyobtainthesockpuppetaccountsandtheirgangstructures.weevaluateourmethodonifengcommentdatasets.empirically,ourapproachtodetectingsockpuppetgangsispreferabletopreviousbuapproachessinceitcanobtainhigheraccuracyand clearer sockpuppet gang structures.4.We propose a method to detect identical-owner sockpuppet accounts as designated account based on the frequent binary itemset mining of user accounts and the consistency hypothesis test of comment features.First,start with designated sockpuppet accounts,frequent binary itemset mining is performed on the account series extracted from related topic comments.Then,we build the ego-network of designated account and adjust the weight of each edge according to the frequency of co-occurrence between two adjacent accounts in the ego-network.We analyze the comment time series feature,time interval feature as well as writing features including length of sentence,usage of function words and usage of punctions.The pruning of edges and vertices in ego-network is then performed based on the consistency hypothesis test of comment features of adjacent accounts.Finally,considering the network topology influence in ego-network,we obtain the comprehensive closeness between designated account and its neighbor accounts.The higher closeness is,the more possible they belong to the same owner.After a series of experiments on Ifeng forum datasets,our method is proved to be effective on detecting sockpuppet accounts belonging to the identical owner in online social network.
Keywords/Search Tags:Internet, User Profile, Linking Multiple User Identities, Accountname Naming Feature, Accountname Changing Feature, Sockpuppet Account
PDF Full Text Request
Related items