Font Size: a A A

Research Of Semantic-based Community Detection Algorithm For Distributed Environment

Posted on:2016-06-02Degree:MasterType:Thesis
Country:ChinaCandidate:J Q LvFull Text:PDF
GTID:2308330464956757Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
In recent years, with the rapid development of Internet, especially with the growing popularity of social network, community detection has become an important research topic. Under the background of the explosive growth of data, the relationship among the individuals in the social network is becoming increasingly complex. It is becaming an important task to find the individuals with high relationship in the complex social network and partition them into one community.For large-scale data, the traditional centralized computing method took a long time on community detection and can’t achieve the real-time feedback of the network. In view of the distributed system with high throughout, high concurrency, and low latency, it can meet the processing requirements for large-scale social network. Nowaday the distributed processing for social network has drawn researchers’ attention. This paper mainly focuses on the community detection method based on distributed computing platform Hadoop, which can improve the efficiency of large-scale community detection. The thesis mainly includes three aspects as following:First of all, a distributed storage approach for large-scale social network is proposed to partition the graph on HDFS. The approach adopt an improved distributed co-clustering algorithm to divide the social network into the partitions. The algorithm partitions the social network with the purpose of allocating the individuals with high relationship into the same Datanode of HDFS, in order to improve the efficiency of community detection.Secondly, a distributed community detection algorithm based on Map Reduce is designed in order to process the distributed social network data on parallel. The algorithm adopts modularity to measure community, and introduces the semantic model to prepocess the community to build sematic relation among the individuals by Bayes model, and to improve the accuracy of community detection.Finally, we testifies the effectiveness of the proposed approach in the thesis by the extensive experiments on real datasets. The experiments show that, distributed co-clustering algorithm proposed in this paper is able to process the large-scale data more efficiently, compared with the traditional centralized co-clustering algorithm. Community detection algorithm by the distributed modularity induced in distributed environment can improve both the accuracy of community partitioning and the efficiency of modularity-based algorithm.
Keywords/Search Tags:Graph Partitioning, Community Detection, Distributed Computing, Modularity
PDF Full Text Request
Related items