Font Size: a A A

Improvement Of Spectral Clustering Algorithm Based On Local Principal Component Analysis And Self-paced Learning

Posted on:2021-01-29Degree:MasterType:Thesis
Country:ChinaCandidate:T TongFull Text:PDF
GTID:2428330629453122Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The rapid development of Internet technology has produced massive amounts of data,and extracting useful information from these data has become a hot issue in research.Clustering,as a classic unsupervised machine learning method has been widely studied because it can obtain the inherent information of the data.Real-world datasets often have uneven quality and often contain noises and outliers,and the actual distribution of the dataset is usually more complicated.In addition,in the process of collecting and storing data,there may be some missing information in the data.However,most existing spectral clustering methods do not take these issues into consideration,resulting in a model that has weak robustness.This paper focuses on the clustering problem with complex distribution and noises,aiming to improve the robustness of traditional spectral clustering algorithm,especially the ability of handling datasets with missing value and noises to improve the clustering performance.The specific research contents of this article are as follows:First,an improved spectral clustering algorithm based on local principal component analysis is proposed.Specifically,this paper first selects samples in the data set through automatic learning to relieve the impact of low-quality samples on the clustering model.Then local principal component analysis is applied to make the low-dimensional data after spectral decomposition better retain the global and local information of original data.After that the connected graph decomposition algorithm is used to output the clustering result without specifying the number of cluster,and finally divide the remaining samples into clusters by distance.Second,a one-step spectral clustering algorithm based on missing value and self-paced learning is proposed.Specifically,using the one-step spectral clustering model to eliminate the cumulative error that may be caused by the intermediate steps.Then the processing of missing value is fused to make full use of the remaining information in missing items.After that self-paced learning is introduced to rank the importance of samples and use different quality data in different learning stages to mitigate the impact of outliers and noises on the clustering model.Finally,spectral rotation is performed on the obtained clustering results to relieve the influence of the random hyperplane on the decomposition of the spectral graph,so that the performance of the obtained clustering results is better.This paper focuses on the improvement of traditional spectral clustering algorithms which require the specific number of cluster and the inability to handle datasets with noises,outliersand missing value.The research used techniques such as local principal component analysis,connected graph decomposition,self-paced learning,one-step spectral clustering,and missing value processing.Experimental analysis on multiple real datasets through different evaluation indicators shows that the upgraded spectral clustering algorithm proposed in this paper is superior to comparison algorithms in each evaluation indicator.In future work,I consider to combine the feature extraction function of the neural network with traditional clustering methods to further improve the performance of the algorithm.
Keywords/Search Tags:Local Principal Component Analysis, Connected Graph Decomposition, Missing Value, Self-paced Learning, One-step Spectral Clustering
PDF Full Text Request
Related items