Font Size: a A A

Research On Social Network Sampling Algorithm Based On Random Jump Strategy

Posted on:2020-03-08Degree:MasterType:Thesis
Country:ChinaCandidate:L M WangFull Text:PDF
GTID:2428330572968589Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet,social networks are affecting and changing people's lives.The research of network structure based on large number of complex network data sets is increasingly popular.Due to the large scale of existing social network data and privacy issues,it is hard to analyze the entire network data directly.Therefore,a reliable and effective network sampling algorithm is very important for the actual estimation of online social networks(OSNs).Existing network sampling method like Metropolis-Hasting Random Walk(MHRW)can obtain unbiased sample sets from relatively large-scale social networks such as Facebook,and describe the key features of the original network.Moreover,MHRW uses the distribution function for sampling control,which satisfies the needs of social network sampling.However,MHRW has the defect of partial graph oversampling during the sampling process.Aiming at the above characteristics of online social network data and the defects of MHRW sampling algorithm,a series of researches on online sampling and sample evaluation of network data are carried out with the core of network data sampling.The main research contents and innovations are as follows:1.In order to solve the problem of partial graph oversampling in MHRW sampling,a random jump strategy is introduced to improve it,and a hybrid jump sampling algorithm(Hybrid Jump sampling,HJ)is obtained.This article has conducted a lot of experiments on Facebook and Twitter datasets respectively.Some network characteristics such as convergence,degree distribution,distribution of sampling nodes and transitivity obtained are compared in different sampling,which proves that the sampling performance of HJ is stronger and can be applied widely.Moreover,by adjusting the size of the jump probability value in HJ algorithm,the experimental results show that the impact of different jump probabilities on the convergence of the HJ sampling algorithm is small and negligible.2.Taking Zhihu online network as an example,a distributed social network sampling system is designed by HJ.This paper describes the composition of the two systems and the resource optimization scheme in detail.The URL of users in Zhihu online network is sampled based on HJ algorithm.Then extract the user page information for processing,collation and storage.This crawler system makes it easier to collect and store social network data.
Keywords/Search Tags:OSNs, MHRW, cubic spline interpolation, social network crawler system
PDF Full Text Request
Related items