Font Size: a A A

The Research On Parallelization Of The Community Partitioning Algorithm

Posted on:2013-01-07Degree:MasterType:Thesis
Country:ChinaCandidate:S Y LiFull Text:PDF
GTID:2248330371969920Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Community partitioning has important applications in biological and medical sciences.But with the expansion of data size, the classic serial algorithm is not sufficient. In order tomeet the needs of the development of large-scale data and information technology, studies ofparallel partitioning algorithm becomes urgent. With the rapid development, application ofparallel computing becomes possible especially after the use of the MapReduce programmingmodel, which makes the computation much convenient.At first, the spectral bisection method was optimized and then parallelization was madeon spectral bisection method based on the MapReduce programming model. MapReduce is anopen source programming model in Google, compared to traditional parallel computing model,the MapReduce model hides and encapsulates the details of the bottom level, which greatlyfacilitates the design and implementation of parallel program. By using the MapReduceparallel programming, the user can concentrate on his parallel tasks and do not need toconsider the details of the application and the messaging.Spectral bisection method is a classical algorithm of community partitioning, it calculatesaccording to the eigenvalues and eigenvectors of the Laplace matrix and the aggregationrelationship. But this method becomes bloated for massive data processing. With theexpansion of data size, the time complexity becomes intolerable.In order to solve this problem, the Canopy and the Lanczos algorithms were used toimprove the spectral bisection method, and increase the operating efficiency of the serialalgorithm. In order to further improve the efficiency of the algorithm, the MapReduceprogramming model was used to improve the parallelism of the spectral bisection method,which further improved the computing speed. The experimental analysis shows that theaccuracy of the optimized spectral bisection method is basically identical with that of theunoptimized spectral bisection algorithm. But, the execution speed and convergence of theoptimized spectral bisection method is obviously improved. With the parallelization of theoptimized spectral bisection method, the parallel efficiency of the algorithm is graduallyincreased with the increase of the compute nodes, which further demonstrates the superiorityof the parallel algorithm in dealing with large-scale data.In summary, the main work and innovations of this thesis are in the following twoaspects:(1) An optimized spectral bisection method is firstly introduced, and Lanczos is used incalculation of eigenvalues and eigenvectors, so as to quickly find the eigenvalues satisfying the conditions. And then, in the final space Poly classification of the new samples, Canopyalgorithm is used in order to quickly and accurately find the initial cluster centers, which canroughly divides the network data into a number of relatively compact subsets in theinitialization, help to the selection of the class center, and improve the convergence rate.(2) According to the complex network of massive data size and the characteristics ofMapReduce programming model, the data storage is optimized by reasonable data slice andoptimization to make load balance and efficiency increase.(3) The spectral bisection method is optimized by using the MapReduce programmingmodel. The experimental data verify the high efficiency of the optimized and parallelizedspectral bisection method.
Keywords/Search Tags:Community partitioning, Spectral bisection method, k-means, Canopy, Lanczos, MapReduce
PDF Full Text Request
Related items