Font Size: a A A

Research On Fast Graph Clustering Algorithm On Large-Scale Data

Posted on:2022-02-04Degree:MasterType:Thesis
Country:ChinaCandidate:F Z YouFull Text:PDF
GTID:2518306509970179Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
As an important technology in data mining,clustering analysis has been applied to various fields.Among them,spectral clustering is more and more widely used as a representative algorithm of graph clustering.With the development of the information age,the scale of data sets has become larger and larger,and the data that needs to be processed is also becoming more and more,which will make it difficult to use traditional methods for large amounts of data.This article focuses on how to accelerate graph clustering under large-scale data sets,and systematically researches the selection methods of key nodes and graph clustering acceleration algorithms under large-scale data sets.The main research contents are as follows:(1)To improve the usability of the spectral clustering algorithm on large-scale data sets,a fast graph clustering algorithm based on the selection of key nodes is proposed.This algorithm consists of three steps:A fast node weight evaluation method is established based on the compactness and separation of clusters;The key nodes are selected to replace the original data set to construct a bipartite graph,and the approximated eigenvectors of the data are obtained by singular value decomposition;Multiple approximated eigenvectors are integrated to improve the robustness of the approximated spectral clustering results.In addition,this new algorithm has been compared with other representative spectral clustering algorithms using experimental analysis on benchmark data sets.This demonstrates that the new algorithm can identify complex class structures in data more efficiently than other clustering algorithms.(2)In order to further improve the accuracy of graph clustering under large-scale data sets and increase the operation speed,a fast graph clustering algorithm based on the improvement of bipartite graphs is proposed.Based on the original algorithm,the algorithm selects key nodes again in the process of constructing the bipartite graph and reduces the scale of the matrix required for singular value decomposition.When the size of the data set is n × n,the size of the singular value decomposition matrix is diminished from d × n to d × m.Through the experimental analysis with the bipartite graph-based clustering algorithm,the results show that the new algorithm improves the calculation speed while maintaining the clustering accuracy.(3)In order to display large-scale data graph clustering algorithm,a fast graph clustering system for large-scale data is designed.The system includes data import,algorithm parameter setting,result display and other modules.It arranges some of the algorithms mentioned in this paper,and clearly shows the effectiveness of different algorithms in different data sets.The research results of this paper enrich the clustering research under large-scale data sets and put forward more possibilities for clustering algorithm research in the era of big data.
Keywords/Search Tags:Cluster analysis, Graph clustering, Bipartite graph, Cluster ensemble, Spectral clustering
PDF Full Text Request
Related items