Font Size: a A A

Study On Overlapping Community Detection Based On Local Expansion And Optimization

Posted on:2016-07-12Degree:MasterType:Thesis
Country:ChinaCandidate:C ZhangFull Text:PDF
GTID:2308330479984825Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
In the development of some fields like bioinformatics, social networks and web analysis, a lot of data in these complex networks have been accumulated. It makes the job of detecting community structure in those kind of data in a timely and quick way become an important task in the field of data mining. A feature of the definition of community in those traditional algorithms is that communities are always disjoint. However, in the real data structure, a node, sometimes, belong to multiple communities. It means that there are overlaps between communities. Considering the characteristic of the diversity of data structure, developing a fast and accurate algorithm for overlapping community detection in complex system structure has become a more challenging task. In this paper, we mainly study the overlapping community detection algorithm.In the year of 2009, Andrea Lancichinetti, etc, proposed LFK algorithm. It is an efficient overlapping community detection algorithm which is based on an efficient local expansion and optimization model. And in the experiment of the algorithm, it design a better evaluation index to evaluate the accuracy of overlapping community detection methods. Based on this, Conrad Lee, etc, proposed GCE algorithm in 2010.It solved the problem of LFK that some special seed can’t stop expanding in a cyclic extension. These algorithms are simple and efficient, but also have some shortcomings and some place for improvement: Firstly, the inner flow of the algorithm can be adjusted to make parallelization possible. Secondly, the factor α in the local expansion and optimization function can be consider into the community growing process so that the selection of factor α can be self-adaptive. It reduce many manual operations of α so that the algorithm can converge to the optimal results more quickly.In this paper, we made some targeted research toward these above-mentioned shortcomings, the research and achievements are as follows:Firstly, introducing the α adaptive mechanism into GCE algorithmIn this case, we adjusted the mechanism of the local expansion and optimization through analysis the local expansion and optimization function used in community growing process. This makes the algorithm can adaptively select the factor α and maintaining the accuracy, thus removing the complicated operation of manual selection of factor α. And after the introduction of adaptive mechanism, we further reduced the candidate set used in extension on the basis of the adaptive model, so as to enhance the speed of the algorithm and make up for the lost performance of the adaptive mechanism.Secondly, introducing a parallel model into GCE algorithmIn this case, we split the seed expansion process and the alternative community filtration process into different CPU based on the analysis of working flow and principle of the GCE algorithm. This make the algorithm achieve the task level parallelism. At the same time, partitioning task into subtasks to process subsets of data. This make the algorithm achieve the data level parallelism. All above, we used the multi-core of computers, so as to enhance the speed of algorithm’s execution.Thirdly, making experiments to verify the improvementWe use the modified standard mutual information index(NMI) as the accuracy evaluation index. Then, construct different categories of artificial data structure according to the LFR model, So as to analyze and verify the GCE algorithm and the improved algorithm. In the last, we use the protein protein interaction data sets from the Combined-AP/MS network and the known protein compound data sets listed in CYC data sets to do an experiment in real data sets. The experiments show that the improved algorithm has a certain improvement in the availability and more quick converge to the optimal results on the algorithm.
Keywords/Search Tags:Data mining, overlapping community detection, local clustering, Multi-core parallel computing
PDF Full Text Request
Related items