Research On Determination Of Cluster Numbers And Clustering Algorithms Based On Deep Learning Methods

Posted on:2019-01-04

Degree:Master

Type:Thesis

Country:China

Candidate:Y Q Wang

Full Text:PDF

GTID:2428330611993342

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Clustering is one of the most fundamental tasks in machine learning,and thus fostered abundant research.Most traditional clustering algorithms are generally applied directly on the original data space.Their performance would be adversely affected when the original data points lie in a high-dimensionality space.Recent proposed deep clustering approaches have sought to overcome this shortcoming by training Deep Neural Network to construct a nonlinear embedding of the original data into a low-dimensional feature space where it is clustered.Although demonstrating promising performance in various applications,existing deep clustering algorithms are usually required to pre-specify the number of clusters.This is usually unknown in advance,which,therefore,prohibits them from applying in practical applications.In order to solve this challenge,we conduct a elaborated analysis on the existing deep clustering algorithms,and try to explore the nature of their problems.Our work can be concretely divided into the following two parts:(1)We propose Deep Embedding for Determining the Number of Clusters(DED).It is a method that can solve jointly for the unknown number of clusters and feature extraction.DED first combines the virtues of the convolutional autoencoder and the t-SNE technique to extract low dimensional embedded features.Then it determines the number of clusters using an improved density-based clustering algorithm.DED can be an important component complementary to many existing algorithms.When the number of clusters K is not given,DED can provide the number of clusters K of high dimensional datasets,which enables other deep clustering methods to train specific models.Our experimental evaluation on image datasets shows superior performance of DED over state-of-the-art methods and robustness with respect to hyperparameter settings.(2)Deep Density Clustering with Similarity Preservation(DDC-SP).It is a method that can automatically determine the number of clusters and achieve a satisfying clustering result.The key idea is to learn a low-dimensional cluster-oriented feature space with maintaining the pairwise similarity in which the number of clusters can be determined automatically.Specifically,we first propose a deep learning model that consists of an autoencoder and a pairwise similarity preservation network attached to the hidden layer of the autoencoder.The model is trained by minimizing the reconstruction loss and the Kullback-Leibler divergence between the pairwise distance probabilities of the data points and the feature representations.Next,the cluster centers are determined by analyzing the density characteristic in the learned feature space.Finally,we finetune the model by minimizing a cluster assignment hardening loss and the reconstruction loss to further boost the separability of clusters.Extensive experiments indicate that the propsoed DDCSP: 1)can accurately estimate the number of cluster;2)can achieve better or comparable performance when compared with state-of-the-art competitors that require pre-specifying the number of clusters.

Keywords/Search Tags:

Deep Clustering, Determination of number of clusters, Feature Extracion, Deep Learning, Density-based Clustering

PDF Full Text Request

Related items

1	Research On Clustering Algorithm Based On Automatic Determination Of Class Number Technology
2	Research On Key Technologies Of Clustering Based On Deep Learning
3	Research On And Application Of Clustering Algorithms Based On Deep Learning
4	Study On Parameter-free Peak Clustering Algorithm
5	Research Andapplication On Determining Optimal Number Of Clusters In Cluster Analysis
6	Research On Deep Semi-supervised Clustering Algorithm
7	The Research On Arbitrary Shape Cluster Algorithm Based On Hierarchy And Density
8	A Research Of Deep Learning Based Clustering Algorithm
9	Research On Skin Detection Algorithm Based On Deep Learning And Density Peak Clustering
10	Density-sensitive K-means Clustering Algorithm