Font Size: a A A

Improving Deep Clustering Via Embedding Selection And Ensemble Learning

Posted on:2020-10-04Degree:MasterType:Thesis
Country:ChinaCandidate:J H HanFull Text:PDF
GTID:2428330623463623Subject:Computer technology major
Abstract/Summary:PDF Full Text Request
Clustering is one of the most fundamental problems in unsupervised learning area.The purpose of clustering is to gather similar instances into the same group,which is called a cluster,and thus to reveal the underlying data structure.Traditional clustering algorithms are limited by computational complexity and deficiencies at the algorithm design level,and often cannot handle more complex and structured data?clustering of pictures or videos?.Deep clustering models are developed to reveal complex data structures to relatively low-dimensional spaces with deep neural networks and produce remarkable clustering results.In this paper,we focus on the initial-state sensitivity problem of most previous deep clustering models.These models usually first pre-train autoencoders and then perform clustering based on the embedding vectors,yet the lack of supervision in pre-training makes their clustering quality highly relies on parameter initialization.The experimental results show that different neural network initialization values will produce clustering results with different accuracy,and the experimental results have a large diversity,that is,the prediction of the same sample will produce different class labels.For example,the deep clustering model DCEC[1]has a standard deviation of 4.1%for clustering accuracy?ranging from 86%to 95%?in 100 experiments on MNIST[2],which is a very high variance for such a clustering task.To solve the issue,in this paper we propose purity score to assess the quality of pre-trained embeddings.Embeddings with clear partition are considered as high-quality embeddings,which achieve a high purity score,otherwise with a low purity score.Experiments show that high quality embedding can produce more accurate clustering results.Then we take the advantages of ensemble learning and propose a novel clustering method named Deep Clustering Ensemble?DCE?.DCE first trains several autoencoders,and performs embedding selection by measuring qualities of the pre-trained embeddings.Then based on the original loss function,a new loss function based on the average clustering result is proposed,and finally combines high-quality ones together to further enhance the clustering performance.Our experiments on three real-world datasets show that DCE significantly improves clustering results compared to the state-of-the-art deep clustering models.Future work about deep clustering ensemble may include directly getting the superior initial feature embeddings and using weighted methods to perform ensemble learning.
Keywords/Search Tags:Deep Clustering, Convolutional Autoencoders, Deep Neural Networks, Ensemble, Embedding Selection
PDF Full Text Request
Related items