Font Size: a A A

Research On Spectral Clustering Algorithm Based On Weighted Ensemble Nystr?m Sampling

Posted on:2020-05-13Degree:MasterType:Thesis
Country:ChinaCandidate:C LiuFull Text:PDF
GTID:2428330623965346Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Spectral clustering algorithm is a research hotspot in the field of machine learning in recent years.It is based on algebraic graph theory and can effectively solve many practical problems.However,in the process of spectral clustering,the computational complexity of feature decomposition is usually O(n~3),that limits the application of spectral clustering algorithm in big data.The Nystr?m method uses several sampling points in the dataset to perform approximate calculations and approximates the real feature space,effectively solving the problem of spectral clustering on time and space overhead on large-scale data.Aiming at the problem that the existing Nystr?m method has unstable clustering effect and weak sample representation in spectral clustering application,a spectral clustering algorithm based on weighted ensemble Nystr?m sampling is proposed.Firstly,the overall weight of the data is evaluated.The statistical leverage score is used to distinguish the importance between the data,and the weight matrix reflecting the importance of the sample is obtained.This weighting method can find influential data points in the data distribution,ensuring the difference and validity of data points in the sample kernel matrix.Secondly,based on the weights,a weighted K-means central point sampling algorithm is used to obtain multiple sets of representative sampling points,and a sample matrix similar to the data distribution is obtained.Nystrom approximate calculation is carried out on these highly representative sample points,and the eigenvector of the approximate kernel matrix is closer to the real eigenvector value.Thirdly,an ensemble framework is introduced to construct multiple approximate kernel matrices by running Nystrom method several times in parallel.Finally,the ridge regression method is used to determine the mixing weights,and the approximate kernel matrix combinations are optimized to generate the final approximate kernel matrix,which produces a more accurate low rank approximation than the standard Nystr?m method.Experimental results of UCI data set show that the proposed algorithm has better clustering performance and is more suitable for solving data clustering problem under the ensemble clustering framework.The thesis has 7 figures,7 tables,and 58 references.
Keywords/Search Tags:spectral clustering, Nystr?m sampling, statistical leverage score weighting, weighted ensemble Nystr?m, ensemble framework
PDF Full Text Request
Related items