Font Size: a A A

Research About Improved DEC Algorithm And Semi-Supervised Learning Method

Posted on:2021-05-09Degree:MasterType:Thesis
Country:ChinaCandidate:Z H YanFull Text:PDF
GTID:2518306512990619Subject:Statistics
Abstract/Summary:PDF Full Text Request
Clustering analysis is a kind of unsupervised machine learning algorithm.It clusters a group of data without label.The data are divided into the same group with similar features,and the features of different groups are quite different.The main problem of clustering algorithm include the selection of distance metrics and the selection of optimal cluster number(k value),etc.Semi-supervised clustering algorithm is a method to improve the performance of clustering model by using semi-supervised learning method.It can learn better initial cluster center,distance measurement and other important influencing factors by using label information of partial labeled datasets,and can also monitor the clustering process wirh optimizing model parameters,iteratively.Therefore,it can achieve the purpose of improving the performance of clustering method as well as imporving convergence rate more effective.Deep embedding clustering(DEC)algorithm is an unsupervised clustering method for high-dimensional data.It do clustering in the low-dimensional space generated and projected by the autoencoder mechanism,and takes the self-defined high confidence target distribution as the target distribution,optimizes the parameters of dimension reduction network and the clustering center initiallized at the same time to complete the clustering task.In this paper,the problem is that the dimension reduction feature space is unknown and it is impossible to determine the different dimensions and importance of each feature dimension.Therefore,firstly,the weighted Mahalanobis distance based on entropy weight method is applied to the DEC algorithm,and the distance metrics in the feature space is improved;In the mean time the Gap statistics(GS)method based on the weighted Mahalanobis distance is given to determine the optimal clustering number.And empirical method is used to prove superiority and feasibility of our proposed formula and method.Secondly,on the basis of DEC algorithm,this paper uses partial label information to introduce semi-supervised clustering method.we establishes a new objective function,and uses the parameter initialization method based on semi-supervised learning to obtain better weight parameters estimation under the improved model.Moreover,we compared with the DEC algorithm and changed the ratio of data which contains label information,and make experiments on the four datasets:Toutiao,Reuters-10k,MNIST and Fashion MNIST.The experimental results show that the Semi Supervised DEC algorithm can obtain the optimal weight parameters by introducing the semi-supervised label informa-tion,so as to learn better classification descision boundary.Our algorithm is effective and feasible.
Keywords/Search Tags:Clustering Method, Semi-Supervised Learning, Information Entropy, Weighted Mahalanobis Distance, Deep Embeding Clustering Algorithm
PDF Full Text Request
Related items