Font Size: a A A

Incremental Methods For Large-scale Social Network Analytics

Posted on:2022-06-05Degree:DoctorType:Dissertation
Country:ChinaCandidate:J J WangFull Text:PDF
GTID:1480306731483504Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet technology and social network media on various electronic products(e.g.computers,mobile phones,i Pads),online social networks play important and indispensable roles in people’s lives,and they have become the important channel to make friends,share news and obtain information.Large-scale data continues to grow,which brings unprecedented opportunities and challenges to social network mining and analysis.This dissertation combines the frontiers of social network analysis,graph computing,probability theory and incremental computing to work on incremental methods for large-scale social network analytics.Specifically,for topological structure analysis and application analysis in large-scale social networks,this paper proposes incremental processing models and algorithms respectively.Our goal is that our methods can incrementally update the results of the previous analysis or prediction based on the new data generated continuously,and achieve the balance of accuracy and efficiency.Our contributions and innovations are summarized as follows:(1)Approximate temporal motif counting in large-scale social networksThe interactive data in the social networks contains rich temporal information,so it can be modeled as a network with timestamps on edges,i.e.,temporal graph.The dynamic and incremental data streams can be modeled as temporal graph streams,i.e.,temporal graphs presented in the form of data streams.We study the approximate counting problem of temporal motif in large-scale temporal graphs and temporal graph streams.The temporal motifs not only consider the subgraph structure(i.e.,subgraph isomorphism),but also the edges directions,edges orderings and motif duration.Therefore,the subgraph counting methods in traditional graphs cannot be used directly in temporal graphs.In addition,the only existing approximate temporal motif counting method is not efficient,especially in large-scale temporal networks.Therefore,we try to propose efficient temporal motif counting algorithms.We first design a general edge sampling ES algorithm to estimate the number of copies of any temporal motif in large-scale temporal graphs.Next,for temporal motifs with3 vertices and 3 edges,which are important motifs to characterize the network structures,we combine edge sampling and wedge(i.e.,a path of two edges)sampling,and propose an improved EWS algorithm.Then,for large-scale temporal network streams,we improved the above ES and EWS algorithms,and respectively proposed streaming algorithms(i.e.,SES and SEWS)for approximating temporal motifs.What is more,we conduct comprehensive analyses of the expected values,the theoretical bounds and the time complexities of our all algorithms.Finally,a large number of experimental results on 5 real datasets show that our algorithm is more accurate,efficient and scalable than the state-of-the-art sampling algorithm for temporal motif counting.(2)Incremental group-level popularity prediction in large-scale social networksThe popularity of a piece of information refers to the number of attentions it obtains,such as,the forwarding times of a message,the viewing times of a video.Given a piece of information,popularity prediction is to predict in advance the attentions that it will receive in the future,based on its early features of information diffusion(including temporal features,diffusion path features,users features,and so on).In online social networks,information diffusion evolves dynamically over time,and traditional static popularity prediction methods cannot characterize the dynamic nature of information diffusion.Therefore,the study on incremental popularity prediction is a necessary and urgent issue to be explored.In addition,compared with the macro(i.e.,population-level)and micro(i.e.,user-level)popularity predictions,the group-level popularity prediction has a low cost and explores more fine-grained prediction results.Therefore,we identify an incremental group-level popularity prediction problem.We also propose a novel incremental popularity prediction model,which is based on the incremental CANDECOMP/PARAFCAC(CP)decomposition technology.Finally,a large number of experimental results on two real social network datasets show that our model is superior to other works in terms of prediction accuracy and efficiency.(3)Reducing cumulative errors of incremental CP decomposition in large-scale social networksCombining the necessity of incremental computing in social network analysis,and the advantages of incremental CP decomposition(i.e.,it efficiently and incrementally mines hidden information among large-scale multi-dimensional data),we try to explore and promote the applications of the incremental CP decomposition.The incremental CP decomposition is efficient,but with the increase of new data,it arises serious cumulative errors.Focusing on this problem,we deeply analyze the errors(which can be divided into the cumulative reconstruction error and the prediction error),and their root causes.Next,we turn our goal of reducing errors while maintaining high efficiency into two optimization problems.Then,based on the errors features,the features of practical applications,and the features of large-scale incremental data,we proposed several restart strategies to solve these two problems,respectively.Finally,in order to verify the performances of our methods,we apply them in two typical online social network applications,including dynamic network reconstruction,and dynamic link prediction.We summarize some main findings by theoretical analysis and experiments,and provide valuable take-home messages for promoting the application of incremental CP decomposition in online social networks.
Keywords/Search Tags:Incremental process, popularity prediction, link prediction, social networks analysis, network structure mining
PDF Full Text Request
Related items