Font Size: a A A

The Research On Graph-based Clustering Analysis

Posted on:2019-12-09Degree:MasterType:Thesis
Country:ChinaCandidate:T ZhangFull Text:PDF
GTID:2370330563998474Subject:Operational Research and Cybernetics
Abstract/Summary:PDF Full Text Request
Clustering analysis is a research hotspot in machine learning at present,which aims to split the data set into several meaningful clusters?also called"classes"?so as to interpret and recognize data.Graph-based clustering?also referred to as graph clustering in this article?is a newer clustering method that first translates data as a graph,and then transforms the problem of clustering into the problem of graph partition.Compared with other clustering methods,many researches have claimed that graph clustering method is more competitive.In essence,the graph clustering method can cluster almost all manifold-shape data,which overcomes the deficiencies that many traditional clustering methods are only good at clustering convex-shape data.Therefore,graph clustering has being widely studied.The contents of this article mainly focus on graph clustering algorithm,graph optimization theory and semi-supervised learning.The purpose of the research is to explore the basic framework of graph clustering algorithm,find the advantages and disadvantages of graph clustering algorithm and propose a new graph clustering algorithm.As our main research results,after in-depth discussion of the low-rank double-random matrix decomposition clustering?namely Data-Cluster-Data,DCD?[25]algorithm proposed by Yang et al in 2016,two new algorithms for graph clustering are proposed as follows:1)Graph Optimized DCD?Graph-Optimized DCD,GoDCD?algorithm.For the DCD algorithm,there is a shortcoming that"the quality of the clustering depends heavily on the quality of the initial similarity matrix".We introduced the graph optimization to GoDCD,which is to alternately optimize the similarity matrix and the indicator matrix of clustering.As a result,GoDCD gets better clustering than DCD algorithm.2)Semi-supervised DCD?Semi-Supervised DCD,SSDCD?algorithm.Because DCD is an unsupervised clustering algorithm and does not take advantage of any a priori knowledge?e.g.,partial weak-tag data?,this often leads to poor clustering results for DCD.In practice,a small amount of weak label data is often obtained,such as pairwise constraint information.This article introduces pairwise constraints into the DCD model and generalizes the DCD from unsupervised case to semi-supervised case,that is SSDCD,which thus effectively improving the quality of clustering.
Keywords/Search Tags:Graph clustering, Double stochastic matrix, Multiplicative updates, Graph optimization theory, Semi-supervised learning
PDF Full Text Request
Related items