Font Size: a A A

A Research Of Deep Learning Based Clustering Algorithm

Posted on:2021-05-24Degree:MasterType:Thesis
Country:ChinaCandidate:X LuFull Text:PDF
GTID:2428330623467814Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the continuous increase of data collection and transmission in modern society,data has also developed in a high-dimensional and unstructured direction,and data mining has become an indispensable tool today.Unsupervised learning has become an important branch because it does not require artificially labeled data.Clustering is an unsupervised method of dividing data into multiple subclasses according to a certain rule.Although classical algorithms such as K-means and DBSCAN have achieved good clustering results on low-dimensional structured data,they are difficult to apply directly due to the dimensional disaster problem of high-dimensional data.Since then,a large number of dimensionality reduction methods have been proposed and applied to a prior step before clustering.However,such methods often require too many artificial assumptions to adapt well to high-dimensional unstructured data.Faced with the fact that high-dimensional data may be distributed in multiple lowdimensional subspaces,subspace clustering has become a class of efficient algorithms.It divides the samples into multiple low-dimensional subspaces,and simultaneously implements class assignment and subspace distribution mining.Among them,self-expressionbased methods have also achieved effective results in maintaining manifolds,but in the face of increasingly complex data,such methods have gradually become difficult to process various types of high-dimensional data,such as audios,images,and text.In recent years,thanks to the development of artificial neural networks and deep learning,its non-linear mapping capabilities have enabled larger-scale and deeper-level feature extraction.Using deep learning to improve clustering performance has become a trend in research.However,most of the current deep learning-based algorithms are based on the Euclidean distance as the similarity between the samples,and it is difficult to maintain the true relationship between the samples.Aiming at this problem,this thesis proposes the subspace consistency hypothesis and local preserving constraints,and proposes two algorithms: Consistent Subspace Clustering Network(SCC)and Relation Guided Subspace Clustering Network(RGSC).This thesis systematically compares related methods through experiments,and proves the effectiveness of the proposed method through comparison of clustering performance.At the same time,the experiments also visualized the similarities learned and performed parameter sensitivity tests.As this method is limited by the sample size,this thesis also extends the proposed algorithm to large-scale datasets,and experiments are performed on large-scale datasets.
Keywords/Search Tags:clustering, deep learning, neural network, subspace clustering, spectral clustering
PDF Full Text Request
Related items