With the advent of the age of big data,people need to process a large number of high-dimensional data.As an unsupervised machine learning method,clustering is of great significance both in production practice and theoretical research.The research on clustering has been developed for a long time,but when dealing with massive high-dimensional data,the performance of the traditional clustering methods proposed earlier will be affected by the " curse of dimensionality ".According to the characteristics of high-dimensional data,the idea of subspace clustering came into being.Subspace clustering avoids the influence of " curse of dimensionality " by exploring several low-dimensional linear subspaces for high-dimensional data.Aiming at the situation that the data characteristics may be nonlinear,subspace clustering based on nonlinear method is developed.In recent years,deep learning technology has been widely used in various machine learning tasks because of its strong adaptive feature extraction ability.The combination of subspace clustering and deep learning framework has produced many effective deep subspace clustering methods.Deep subspace clustering has achieved remarkable performance in unsupervised clustering tasks.On this basis,the self-supervised method is further introduced to learn the discriminative data representation,which can improve the clustering performance.Deep subspace clustering has achieved remarkable performance in unsupervised clustering tasks.While the self-supervised approach is further introduced to learn discriminative representation for clustering performance enhancement.Despite significant improvement of clustering performance,the self-supervised approach heavily depends on the quality of pseudo-label from currently clustering result and it would inevitably degrades the clustering performance when a large number of samples be assigned to incorrect pseudo-label.To solve this issue,we develop a robust self-supervised deep subspace clustering approach by mining and exploiting reliable self-supervised information during training.Specifically,a diffusion processing step based on graph random walk is first developed to improve self-expressiveness matrix such that more accurate clustering result(pseudo-label)can be obtained.More importantly,we introduce an outlier detection approach to identify incorrect pseudo-label for each identified cluster and the unreliable self-supervised can be further alleviated during network training.Experimental studies on several benchmark datasets validate the effectiveness of our approach in terms of discovering reliable self-supervised information during network training. |