| With rapid developing of the application of social network, communication network, and biological network and so on, the regarding graph model data has presented explosive growth over the past few years. Graph, as a data structure, has its own representing method and information, and one graph model may contain hundreds to millions of vertices. However, the associated information of these vertices and their connecting edges has different meanings in different fields and with the expansion of the graph size, how to efficiently analyze the information and acquire the useful information has become a mainstream research direction. As an important tool in machine learning, clustering analysis has been widely applied to text excavation, bioinformatics, pattern recognition and other research fields. With the widespread use of graph model data, graph clustering has become one kind of important clustering method and also one of the most powerful techniques of graph data analysis.The similarity matrix is always constructed by the distance of nodes. But there are multiple paths of equal length and the K shortest path between nodes. And the relationship between these paths will influence the similarity of nodes. Therefore, contribute to a better balance between nodes similar to consider the distance relationship between nodes. Regarding this defect, this paper proposes a graph clustering algorithm DRGC based on top-k shortest path. This algorithm adopts some idea of spectral clustering, and we use the top-k shortest path between vertexes to build similarity matrix. Instead of eigen-decomposition, we use the auto-encoder to implement data reconstruction, which can reduce the time cost greatly. At last we use the non-parametric Bayesian model to do clustering. Due to the Dirichlet process has good clustering properties and it can perform data partitioning automatically, this algorithm can get reasonable partition for data set without pre-defined cluster number.In order to overcome single clustering algorithm’s problems of datasets sensitivity, this paper proposes a clustering algorithm based on Majority Voting Rule. It uses the DRGC, k-means, spectral clustering as base algorithm. Then it takes the clustering result with highest modularity as base label. We unify the cluster labels of different base algorithms by analysis the relationship between them, and give the final clustering result by Majority Voting Rule. In the end, emulation experiment was made for these two proposed algorithms and the experiment proofs that the two algorithms possess great clustering property and can obtain more accurate clustering partition results. |