Font Size: a A A

Research On Community Structure Detection Related Techneques In Large Scale Online Social Networks

Posted on:2016-06-21Degree:DoctorType:Dissertation
Country:ChinaCandidate:S C JinFull Text:PDF
GTID:1318330536967151Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the popularity of social networking,the human society have stepped into online social networking era.Generally,online social networks have a ”community structure”feature,mining and analysis of community structure has become a hotspot and focus.Research on community detection has been carried out for many years,and have achieved fruitful results.However,with advances in information technology,current social networks showing many characteristics,such as large-scale,highly complex and dynamic evolution,etc.,which put forward higher requirements for community detection methods in speed,accuracy and dynamic response:1)Rapidity is a basic requirement.Faced with the online social networks of sustained Scale expansion,continuous interaction strengthen and evolving group behaviors,how to process the large-scale online social networks with efficient methods and techniques has been an important problem for community detection research.2)Accuracy is the pursuit of the goal for community detection techniques.Although there is no uniform standard to measure the quality for all community detection methods,with the same index,we can get a glance at the pros and cons of the various methods.How to find the real divide social network community natural sense,it is found in all communities direction way forward and the ultimate goal.3)Scalability is the development direction of the community detection methods.Traditional methods lack scalability,and cannot support the massively parallel processing,therefore are not suited to handle the current tens of millions of nodes scale online social networks.4)Dynamic evolution of community structures brings new challenges to community detection techniques.Structure dynamic of the online social networks,complexity and diversity of individual and group behaviors,they together bring new difficulties to the community detection research and propose a higher demand.Only pinpoint the community structure during its development process,will it be possible to understand and grasp the evolution of the internal mechanism of the community.In response to these problems,first,we selected current outstanding community structure detection algorithm –infomap,and based on the Map Reduce model,which is used to process big data,to implement its parallelization.Then,we promoted the parallelization to other community structure detection methods,and studied the supporting techniques for detecting community structures across multiple social networks.Specific research results are as follows:1.We paralleled the Infomap algorithm based on Map Reduce model and get a new one– infor MR,which achieved a high scalability.infomap requires a lot of random data access and uses global based greedy search strategy to find communities to combine.To solve this problem,infor MR,based on divide and conquer strategy,segments the social networks according to the order of node arrivals into segmentations;then replaced the global search with local greedy search,which reduced the search space,and facilitated the implementation on Map Reduce;Finally,programed the algorithm on Map Reduce to achieve a massive parallelization version of infomap.Experimental results show that the algorithm has better scalability,with near-linear speedup ratio on computing speed.2.We proposed a Map Reduce based multi-level graph partitioning algorithm– DMGP.infor MR has good scalability,but lower accuracy compared with the original algorithm,mainly because it uses a simple network partitioning algorithm which results in poor network partition quality.Multi-level graph partitioning methods have high partition quality and are widely used.However,existing algorithms cannot meet the need of large-scale online social networks analysis.DMGP is a parallelization of the Metis algorithm on Map Reduce model.1)during the graph(coarsening)compression stage,we implemented the heavy edge matching schema on Map Reduce and archived an approximately exponential compression ratio;2)during the same stage,we used massively parallel reading to improve I/O performance,and avoided the overhead required by saving the lots of intermediate results;3)during the uncoarsening(partition recovery)stage,we use Map Reduce to calculate the quality evaluation of the partition,and to optimize the partition quality.Experiments show that,DMGP can process scale of above tens of millions nodes of online social networks effectively,and its accuracy was less than 10% compared with Metis;DMGP based community detection algorithm– info MR has a quite accuracy compared to informap.3.We proposed a universal community detection framework– MRHCD based on Map Reduce.Most of traditional methods have the same problem as infomap.After analyzing and summarizing the community detection process of info MR,we believe that all communities can be attributed to the discovery process in two phases: network partitioning phase and parallel community detection phase.MRHCD consists of DMGP and HCD.During the network partitioning phase,DMGP is able to get high quality partitions.During the parallel community detection phase,HCD model is able to quickly deploy the community detection algorithms without the need to modify the business logic of the algorithms,and can ensure the community detection accuracy.Deployment of three outstanding community detection methods verifies that MRHCD is able to accelerate the community detection process,and improve their scalability with little loss of precision.4.We proposed a synergistic graph partitioning algorithm–CPMN,a supporting technique for community detection across multiple social networks,which has not been studied by now.People prefer enjoying themselves in multiple social networks.MRHCD is not able to process multiple social networks.The relationship of the multiple social networks are caused by the individuals keeping accounts in them(called as anchor node).CPMN is based on the DMGP,and is used to process multiple social networks.In CPMN,we added the anchor node distribution constraint.During the coarsening phase,we introduced the priority schema and purity to guarantee and record the consistency of the anchor nodes;during the initial partitioning phase,we used the label propagation model to get the initial partition mode;during the uncoarsening phase,we carried out massive parallel partition refinement operations to improve the partition quality,including the cut size and the balance factor.Experiments on two online social networks showed that CPMN is able to partition the anchor nodes into high consistent partitions,and it also has high processing speed and scalability.To summarize,this paper investigates an essential issues of social network analysis,i.e.,community detection.The perfomance evaluation analysis all show that the proposed algorithm can properly achieve the design goals.
Keywords/Search Tags:social network, community detection, MapReduce, infomap, synergistic partitioning, graph partitioning
PDF Full Text Request
Related items