Research On Parallelization Of Community Discovery Algorithm Based On Hadoop

Posted on:2018-02-19

Degree:Master

Type:Thesis

Country:China

Candidate:B Z Lai

Full Text:PDF

GTID:2348330518961609

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Community discovery algorithm is the main method to study community structure in complex networks.With the explosive growth of the network scale,the traditional single and serial community discovery algorithms are not suitable for dealing with the large-scale network.As a new kind of large data processing technology,Hadoop is favored by many developers because of its high scalability,high reliability and simple programming model.Aiming at the limitation of the current serial community discovery algorithm to deal with large-scale network,this paper proposes two parallel community discovery algorithms based on the advantages of Hadoop framework in large data processing.In order to solve the problem of high complexity of Fast-Newman algorithm,this paper proposes a parallel Fast-Newman algorithm based on Hadoop framework.The main difficulty of the parallel Fast-Newman algorithm is that each node can not get the information of its neighbor nodes in the map function,so it is necessary to provide the global graph information with the help of Web services.The improved Fast-Newman algorithm will parallel calculate the increment of the module value after each node merged with its neighbor node in the map function.In the reduce function,we will find out the two nodes with the largest increment of the module value and Merge them,this is a merge process.The combined results are re entered as the input of the map function,and the map and reduce processes are iterated excuted until all nodes are merged into one community.Using Ego-Facebook as a data set,the experimental results show that the parallel Fast-Newman algorithm has a higher speedup.In order to deal with the problem of large-scale network,this paper proposes a Fast-Unfolding parallel algorithm PFU(Parallel-Fast-Unfolding)based on Hadoop.The algorithm mainly uses the idea of "divide and conquer".Firstly,the large-scale network is partitioned and merged alone,and then the network is reconstructed according to the result of each partition.Finally,merge the reconstructed network until the community structure not changed.There are two difficulties in the parallelization scheme: one is how to ensure that the connection information is not lost after partition;the two is how to reconstruct the network after the partition is completed.this paper solves the problem by improving the data storage method and designing the reconstruction scheme.The experimental results show that the PFU algorithm can significantly improve the efficiency of the algorithm and has good scalability on the basis of ensuring the accuracy of the algorithm.Finally,according to the map and reduce phase output of the intermediate file,using gephi software to visualize the results improve the application value of PFU algorithm.

Keywords/Search Tags:

community discovery, hadoop, large-scale network, parallelization, visualization

PDF Full Text Request

Related items

1	Discovery And Research Of The Community Structure In Large-scale Software
2	Design And Implementation Of The Bipartite-network Community Discovery System In Long Time Scale
3	Parallelization Of AP Clustering Community Detection Algorithm Based On Hadoop Platform
4	Hadoop-Based Community Discovery Algorithm Research
5	The Application Of Community Detection Algorithm In Large Scale Network
6	Research On Network Community Discovery Approaches Based On Topological Potential
7	Research On Visualization Problems And Algorithms For Large Scale Social Network
8	The Study Of Community Discovery Method On Large-scale Social Network
9	Research On The Router-Level Topology Discovery Of The Large-Scale IP Network
10	Efficient Algorithms for Analyzing Large Scale Network Dynamics: Centrality, Community and Predictabilit