Font Size: a A A

Research On Substructures-based Sampling Algorithms For Graphlet Estimations In Social Networks

Posted on:2019-02-20Degree:MasterType:Thesis
Country:ChinaCandidate:Q Q ZhaoFull Text:PDF
GTID:2428330542994229Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Graphlets,small connected induced subgraphs in large-scale networks,have ex-tensive applications in social networks and bioinformatics,etc.As the size of graphlets k increases,the number of graphlets increases rapidly and their structures become more complicated.It is a challenge to estimate the relative frequency of all graphlets(graphlets concentrations)quickly in large social networks.Due to the high computational cost of exact counting in massive networks,estimating graphlet statistics approximately via sampling algorithms has become a mostly common used approach.However,most of these algorithms can only estimate graphlets with no more than 5 nodes,and it is difficult to extend to graphlets with more nodes.Therefore,it is significant to design a sampling algorithm with high accuracy and can scale to sample all k-graphlets for larger size k.In this dissertation,we propose a sampling algorithm for graphlet estimations CSRW(Common Substructure based graphlets sampling via Random Walk).Given graphlet size k,CSRW first samples a largest common substructure shared by all k-graphlets,2-path((?)),then expands it via generating nodes randomly from neighbors of multiple nodes rather than a fixed one,to get k-graphlets samples,thus CSRW is able to estimate all k-graphlets in one unified way.In addition,when the size k increases,k-graphlets present more complicated topol-ogy structures.Those denser graphlets tend to appear less in real social networks and thus it is more dificult to sample them.So we propose another sampling algorithm,CSRW2,to improve the sampling accuracy of denser graphlets with less appearance.Given k(k=4,5),CSRW2 first samples two substructures,(k-1)-path and 3-star((?)),expands them to get k-graphlets samples,respectively,and then mixes the two kinds of samples with proportional amplification.CSRW2 can adapt to the complicated changes of graphlet structures and estimate all graphlet concentrations efficiently.Comprehensive experimental evaluations on real networks demonstrate that CSRW can uniformly estimate all k-graphlets.CSRW outperforms the representative algo-rithms SRW2CSS and WRW in accuracy.The estimations on 6,7-graphlets also prove the scalability of CSRW.Furthermore,CSRW2 can estimate all k-graphlets in a uni-form framework(k=4,5),and compared with CSRW,CSRW2 is more friendly to those denser k-graphlets with less appearance in social networks.
Keywords/Search Tags:graphlets, graphlet concentrations, random walk, sampling, social networks
PDF Full Text Request
Related items