Font Size: a A A

Research On K-means Algorithm Based On Denoising Auto-encoder

Posted on:2022-04-20Degree:MasterType:Thesis
Country:ChinaCandidate:H XuFull Text:PDF
GTID:2518306539491984Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the advent of the era of intelligence,data volume and data dimension in the data sets involved in machine learning and data mining are growing explosively.Highdimensional and sparsely distributed data pose a severe challenge to the existing clustering methods.In the face of high-dimensional and sparsely distributed data,the training model often presents problems such as over-fitting,long training time and poor training effect,which is called dimensional disaster in the field of computer.Cluster analysis,based on similarity,divides data into classes or clusters with similar characteristics according to some method or criterion.Its goal is to make the data within the same class or cluster as similar as possible,and the data within different classes or clusters as different as possible.K-means is one of the widely used partitioning based clustering algorithms.K-means algorithm is easy to implement and the clustering speed is fast,but it also has some limitations: it is sensitive to the initial value and easy to fall into the local optimal solution;The irrelevant characteristics of data seriously affect the clustering effect.In the process of algorithm iteration,the position of cluster center needs to be constantly updated.Therefore,the efficiency of the algorithm will be greatly reduced in the face of large data volume and high dimensional data sets.Aiming at the problem that K-means algorithm is not qualified due to highdimensional and sparse distributed data,a K-means algorithm based on denoising autoencoder is proposed.Firstly,the denoising autoencoder is used to reduce the dimensionality of the data to highlight the important feature representation,and then the data after dimensionality reduction is clustered by K-means algorithm.Experimental results on several datasets show that the proposed algorithm is effective for high-dimensional data and can significantly improve the performance of the original k-means algorithm.The main innovations of this algorithm are as follows:(1)The use of denoising autoencoders can effectively reduce data dimensions and highlight important feature representations.(2)The K-means algorithm clustering the data after denoising effectively reduces the calculation amount of the algorithm and improves the efficiency of the algorithm.At the same time,the important feature representation is highlighted to make the clustering effect of the algorithm better.
Keywords/Search Tags:high-dimensional data, Clustering, Self-encoder, Dimension reduction, K-Means
PDF Full Text Request
Related items