Font Size: a A A

Research On Parallel Algorithm For Influence Maximization Problem In Social Networks

Posted on:2018-01-16Degree:MasterType:Thesis
Country:ChinaCandidate:Y F ChangFull Text:PDF
GTID:2428330566497486Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The influence maximization problem is one of the most important problems in social networks,it aims at finding a small set of nodes,let these nodes influence other nodes,so that the number of nodes influenced by the initial set is maximum.For large-scale social networks,a single commercial computer is no longer able to complete the task both on the amount of data stored and on the computing time.How to effectively find the most valuable users in the massive social network to spread the information has become a new problem in the era of Big Data.In this paper,based on the characteristics of massive social networks,we solve the influence maximization problem using Hadoop's distributed storage and distributed parallel computing and Google Pregel's “Vertex-Centric” parallel computing model.In order to solve the problem of cost sensitive influence maximization,a heuristic parallel algorithm named MDCR is proposed in this paper.The algorithm takes into account the topology of the network and the cost factor,a nd preferentially selects the node that has maximum degree over cost ratio to join the seed set.As for the social network with community structure,the general heuristic algorithm may lead to the overlapping of influence.This paper proposes a CMD algorit hm based on the community structure property of social network.CMD uses community discovery algorithm to mine community information in social networks,and filters out significant communities through Map Reduce.The number of seeds is allocated according to the size of the community to form the final seed set.For online social networks in certain scenarios,the dissemination of information often requires a certain time limit.Based on the independent cascade model,we design a time-sensitive propagation model by adding hop limit and propose the PPDD algorithm.The algorithm uses the degree and propagation probability of a node to approximate its influence based on the hop limit.In each round of selection,the node that has the greatest influence will be added to the seed set,the influence of the neighbors of the selected node will be recalculated by discounting.Based on real data sets,the experiments are conducted on the proposed MDCR,CMD and PPDD algorithms.The results show that,the three algorithms have good performance both on the influence spread and running time for large-scale social networks.The extended experiments are also conducted with different scale of Hadoop cluster,the results show that when the scale of cluster increases,the running time of these algorithms is further reduced,and the efficiency is further improved.
Keywords/Search Tags:influence maximization, social networks, parallel algorithm, hadoop
PDF Full Text Request
Related items