Research On K-means Algorithm Based On Denoising Auto-encoder

Posted on:2022-04-20

Degree:Master

Type:Thesis

Country:China

Candidate:H Xu

Full Text:PDF

GTID:2518306539491984

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the advent of the era of intelligence,data volume and data dimension in the data sets involved in machine learning and data mining are growing explosively.Highdimensional and sparsely distributed data pose a severe challenge to the existing clustering methods.In the face of high-dimensional and sparsely distributed data,the training model often presents problems such as over-fitting,long training time and poor training effect,which is called dimensional disaster in the field of computer.Cluster analysis,based on similarity,divides data into classes or clusters with similar characteristics according to some method or criterion.Its goal is to make the data within the same class or cluster as similar as possible,and the data within different classes or clusters as different as possible.K-means is one of the widely used partitioning based clustering algorithms.K-means algorithm is easy to implement and the clustering speed is fast,but it also has some limitations: it is sensitive to the initial value and easy to fall into the local optimal solution;The irrelevant characteristics of data seriously affect the clustering effect.In the process of algorithm iteration,the position of cluster center needs to be constantly updated.Therefore,the efficiency of the algorithm will be greatly reduced in the face of large data volume and high dimensional data sets.Aiming at the problem that K-means algorithm is not qualified due to highdimensional and sparse distributed data,a K-means algorithm based on denoising autoencoder is proposed.Firstly,the denoising autoencoder is used to reduce the dimensionality of the data to highlight the important feature representation,and then the data after dimensionality reduction is clustered by K-means algorithm.Experimental results on several datasets show that the proposed algorithm is effective for high-dimensional data and can significantly improve the performance of the original k-means algorithm.The main innovations of this algorithm are as follows:(1)The use of denoising autoencoders can effectively reduce data dimensions and highlight important feature representations.(2)The K-means algorithm clustering the data after denoising effectively reduces the calculation amount of the algorithm and improves the efficiency of the algorithm.At the same time,the important feature representation is highlighted to make the clustering effect of the algorithm better.

Keywords/Search Tags:

high-dimensional data, Clustering, Self-encoder, Dimension reduction, K-Means

PDF Full Text Request

Related items

1	Research On Dimension Reduction Algorithms For Preserving Clustering Structures
2	Dimension Reduction And Clustering For High-Dimensional Data
3	Research On Dimension Reduction Methods Of High Dimensional Data
4	Neural Network Based Dimensionality Reduction And Its Application In High-dimensional Data Clustering
5	Research On Dimension Reduction Methods For High-dimensional Complex Data
6	Research On Constructing Deep Structure Model For Dimension Reduction And Classification Of High-Dimensional Data
7	Research On And Design Of Dimensionality Reduction Algorithm For The High Dimensional Data
8	Cluster analysis of high dimensional data and dimension reduction for regression
9	A High Dimensional Data Stream Clustering Algorithm Of Quick Dimension Reduction
10	The Study And Implementation Of High Dimensional Data Visualization Platform Based On Nonlinear Dimensionality Reduction Methodson