Font Size: a A A

Hypergraph Clustering Method Based On Spark

Posted on:2017-12-13Degree:MasterType:Thesis
Country:ChinaCandidate:X L LiuFull Text:PDF
GTID:2348330536953092Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Clustering Method is one of the hotspots in data mining and in recent years,spectral clustering method based on graph theory has become a very active fiield of machine learning research.Compared with the traditional clustering algorithms such as K-Means,spectral clusetring method can cluster in the sample space of arbitrary shape and converges to the global optimal solution.Spectral clustering caculate the Laplacian matrix of similarity matrix of graph.Then obtaining eigenvalues and eigenvectors and cluster in the eigenvectors.The traditional spectral clustering is based on the partition of simple graph,but the simple graph has some shortcomings in multivariate relationla data representation.In recent years,hypergraph model widespread concern academia.Compared with traditional graph model,it can describe the high-order information of the data very well.Due to the high complexity of spectral culstering itself and establishing hypergraph model is also more complex than traditional digram,using the algorithm on large scale data sets computational complexity and stroage capacity will be very big and hardly apply in single machine.However,studies in cloud computing and big data processing plateform is very popular in recent years.For example,Hadoop platform based on MapReduce and Spark platform based on RDD can help parallel the algorithm and imporve the operating efficienty of the algorithm.Since Spark platform is based on memory computing,processing speed is better than Hadoop and good at iteration.This paper choose Spark platform achieve hypergraph spectral clustering.This paper introduces the paralel algorithm technologies involed first,including Spark platform and programming model,clustering algorithms and evaluation crieria of clutering,basic concept of hypergraph.Then introduces the princilpe algorithm of hypergraph spectral culstering,the relationship between laplacian matrix and graph division,give the general procedure of hypergraph spectral clustering.Secondly,achive the parallel algorithm of hypergraph spectral clustering on Spark platform including four-stage parallelization:hypergraph constructing and generate laplacian matrix,obtaining eigenvalues and eigenvectors and K-Means clustering.Finally,this paper designs several experiments in the open data sets and mobile users advertisement data sets.Analyze the clustering result and running time of the two algorithm and find the clustering result of hypergraph spectral clustering is more realistic and its parallel algorithm on the Spark paltform is better than single algorithm.
Keywords/Search Tags:Hypergraph, Spectral Culstering, Spark, parallelization, Laplacian matrix
PDF Full Text Request
Related items