| Clustering is one of the important tasks of machine learning,computer vision and other disciplines.It is an important branch of unsupervised learning and has a wide range of applications in image processing,user recommendation,pattern recognition,data analysis,anomaly detection and other fields.Clustering aims to statically divide similar data objects into clusters by analyzing the characteristics of the objects,so the clustering method can also be understood as a data compression method.Since the clustering task does not require numerous collected labels and labeled training tuples,clustering methods have become an important means of unsupervised data analysis and processing.With the expansion of data scale and the increase of complexity of real data,various situations may occur in the process of exchange and dissemination of original data,resulting in missing data.The most widespread scenario is the missing dimension of data features.The algorithm is mainly divided into shallow missing clustering algorithm and deep missing clustering algorithm.Shallow clustering algorithm separates the data filling task from the clustering task,and its clustering performance is not good;while the deep missing clustering algorithm is mostly oriented towards sample missing.task,unable to handle scenes with missing features.Based on this,this paper takes the clustering task of dealing with missing data feature dimensions as the starting point,and combines the relevant theories and technologies such as student t distribution and optimal transmission distance to construct an accurate and efficient model for missing data,and design a onestep deep clustering.method to improve the comprehensive performance of parameter storage,operation speed and clustering accuracy.The research contents mainly include:1.In order to solve the problem of separation of data filling and clustering tasks,a deep missing clustering method based on Student’s t distribution is proposed.Based on the autoencoder network,this method constructs an end-to-end deep incomplete clustering algorithm.By jointly optimizing the KL divergence-based clustering layer and the mean square error reconstruction loss of the autoencoder,the task of data filling and clustering is realized.The combination of tasks is experimentally compared with several existing methods on three large-scale datasets,and the superior performance of the method in the missing clustering task is verified.2.By introducing optimal transmission theory into clustering task,a deep missing clustering method based on optimal transmission distance is proposed to solve the problem of distribution retention of missing data in neural network.Firstly,we theoretically analyze the failure of existing missing clustering methods in the face of high-dimensional data,and then propose a novel end-to-end deep missing clustering network by minimizing the difference between the original distribution and the reconstructed distribution.The optimal transmission distance between the samples preserves the distribution characteristics of the samples and optimizes the clustering results.Extensive experiments demonstrate the effectiveness of the optimal transfer application in the deep missing clustering task,and the method greatly improves the clustering performance. |