Font Size: a A A

Research On Determination Of Cluster Numbers And Clustering Algorithms Based On Deep Learning Methods

Posted on:2019-01-04Degree:MasterType:Thesis
Country:ChinaCandidate:Y Q WangFull Text:PDF
GTID:2428330611993342Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Clustering is one of the most fundamental tasks in machine learning,and thus fostered abundant research.Most traditional clustering algorithms are generally applied directly on the original data space.Their performance would be adversely affected when the original data points lie in a high-dimensionality space.Recent proposed deep clustering approaches have sought to overcome this shortcoming by training Deep Neural Network to construct a nonlinear embedding of the original data into a low-dimensional feature space where it is clustered.Although demonstrating promising performance in various applications,existing deep clustering algorithms are usually required to pre-specify the number of clusters.This is usually unknown in advance,which,therefore,prohibits them from applying in practical applications.In order to solve this challenge,we conduct a elaborated analysis on the existing deep clustering algorithms,and try to explore the nature of their problems.Our work can be concretely divided into the following two parts:(1)We propose Deep Embedding for Determining the Number of Clusters(DED).It is a method that can solve jointly for the unknown number of clusters and feature extraction.DED first combines the virtues of the convolutional autoencoder and the t-SNE technique to extract low dimensional embedded features.Then it determines the number of clusters using an improved density-based clustering algorithm.DED can be an important component complementary to many existing algorithms.When the number of clusters K is not given,DED can provide the number of clusters K of high dimensional datasets,which enables other deep clustering methods to train specific models.Our experimental evaluation on image datasets shows superior performance of DED over state-of-the-art methods and robustness with respect to hyperparameter settings.(2)Deep Density Clustering with Similarity Preservation(DDC-SP).It is a method that can automatically determine the number of clusters and achieve a satisfying clustering result.The key idea is to learn a low-dimensional cluster-oriented feature space with maintaining the pairwise similarity in which the number of clusters can be determined automatically.Specifically,we first propose a deep learning model that consists of an autoencoder and a pairwise similarity preservation network attached to the hidden layer of the autoencoder.The model is trained by minimizing the reconstruction loss and the Kullback-Leibler divergence between the pairwise distance probabilities of the data points and the feature representations.Next,the cluster centers are determined by analyzing the density characteristic in the learned feature space.Finally,we finetune the model by minimizing a cluster assignment hardening loss and the reconstruction loss to further boost the separability of clusters.Extensive experiments indicate that the propsoed DDCSP: 1)can accurately estimate the number of cluster;2)can achieve better or comparable performance when compared with state-of-the-art competitors that require pre-specifying the number of clusters.
Keywords/Search Tags:Deep Clustering, Determination of number of clusters, Feature Extracion, Deep Learning, Density-based Clustering
PDF Full Text Request
Related items