| In the information era,the influence of data is gradually expanding.It is of great research significance to uncover underlying patterns and insights from the collected data and to excavate valuable information to guide people’s production and life.Hardware advances such as increased processor speed and reduced storage costs paved the way for performing largescale,high-dimensional data analysis.The development of machine learning and artificial intelligence technology has become an important supporting technology for high-dimensional data mining.Redundancy and noise are widely existing in high dimensional data,meanwhile,samples are sparse in high-dimensional space,making it difficult to directly measure the distance or similarity between them.These factors make the effective learning methods in lowdimensional spaces significantly degrade or even fail when processing high-dimensional data,which brings severe challenges to high-dimensional data mining.Unsupervised learning is a key technique for mining the intrinsic structure of high-dimensional data and exploring the essential connections or potential laws of data.This dissertation concentrates on studying the clustering analysis and anomaly detection tasks of unlabelled high-dimensional data.The main work and innovation are presented as follows:A locally weighted subspace clustering ensemble approach is proposed,which fuse clusters of different subspaces to obtain clustering solutions with higher accuracy.The definition of a core cluster in a high-dimensional space is given,which is a set of samples categorized into the same cluster in each base subspace.The size of a core cluster is between the cluster and the sample,and during the ensemble process,the core clusters are viewed as the basic units to improve the efficiency of the integration to a certain extent.This dissertation evaluates the stability of clusters by measuring the distance between the core cluster pairs in base subspace,and the similarity between the core clusters and the clusters in the base subspace,then weighting the subspace clustering solution.Four weighted ensemble methods based on the core cluster are proposed to fuse the clusters of each base subspace to obtain the consensus clustering solution.Comparative experiments are conducted on multiple image datasets and gene expression datasets to verify the validity of the proposed approaches.Compared with the state-of-the-art clustering ensemble approaches,the proposed subspace clustering ensemble approaches have higher clustering accuracy,more robust parameters and better integration efficiency.(Chapter 3)The deep embedding Auto-Encoder clustering model is proposed.This dissertation embeds an Auto-Encoder in the encoder unit and decoder unit of the prototype Auto-Encoder,constructing the symmetrical Auto-Encoder network architecture,which has better feature representation ability than the prototype Auto-Encoder,and can effectively learn clustering friendly feature representation.In the model pre-training stage,the hidden layer coding is learned by minimizing the reconstruction loss of the encoder.To get a smoother and more continuous manifold of the hidden layer,this dissertation imposes the hidden layer coding constraint to the objective function and uses the weight parameter to control the trade-off between the reconstruction accuracy and the smooth constraint,which significantly improves the representation ability of the hidden layer coding.In the fine-tuning stage,clustering is performed on hidden layer coding to partition the samples into clusters,using the centroid as the label for the samples,with a supervised manner training model.The self-paced learning method is adopted to select model fine-tuning samples,to prevent samples on the boundary of the cluster to participate in training,while also having better convergence.Experiments are conducted on multiple face image datasets,handwriting datasets and item datasets,demonstrating the effectiveness of the proposed approach by analyzing and comparing the experimental results.(Chapter 4)A two-stage unsupervised anomaly detection model is proposed,which is based on the deep embedding Auto-Encoder and can achieve end-to-end anomaly detection.Based on the embedding Auto-Encoder,this dissertation designs the feature extraction approach for the image anomaly detection task,which makes the extracted features have good robustness.In the model pre-training stage,a network mapping function is learned through the reconstruction loss of the model,so that the hidden layer coding is the valid representation of the samples,making it better suitable for the subsequent anomaly detection tasks.In the fine-tuning stage,according to the clustering loss of the deep embedding Auto-Encoder,an effective surrogate supervision approach is proposed,the vector mean of the hidden layer coding corresponding to input samples is used as the proxy-label,which provides supervision information for model fine-tuning.Self-paced learning method is adopted to iteratively fine-tune reliability samples.An anomaly scoring strategy is designed to evaluate the input samples for anomalies.Experiments conduct on several image datasets,comparing with state-of-the-art unsupervised anomaly detection approaches to demonstrate the superior performance of the proposed approach in unsupervised image anomaly detection tasks.(Chapter 5)... |