Variational Auto-Encoder Combined With T-Distributed Stochastic Neighbor Embedding For Dimensionality Reduction And Cluster Analysis

Posted on:2020-09-10

Degree:Master

Type:Thesis

Country:China

Candidate:Y Y Guo

Full Text:PDF

GTID:2428330596982764

Subject:Applied statistics

Abstract/Summary:

PDF Full Text Request

Today's Internet is booming,and our access to information has grown rapidly with the development of technology.The development of big data has entered a stage of intense heating.However,the data touched by various fields are often high-dimensional data.Thousands of dimensions bring great challenges for subsequent analysis and calculation.Many commonly used algorithms mostly fail in high-dimensional data sets.In order to mine and analyze its potential information from high-dimensional data,a series of algorithms for data dimensionality reduction emerged.The core idea of data dimensionality reduction is to use some kind of mapping on data in high-dimensional data sets,and transform high-dimensional data to obtain its representation in low-dimensional space,so that it can be applied to existing low-dimensional space.The algorithm that can be used.In this paper,the VAE based on MLP neural network and t-distributed stochastic neighbor embedding are combined to reduce the dimension of high-dimensional data unsupervised.We designed the encoder and decoder of three layers,the encoder extracts the characteristics and then approximates the original sample through the decoder.Mini-batch gradient descent method is used for network training,the encoder is used to reduce the high-dimensional data to intermediate dimensions,next,t-distributed stochastic neighbor embedding to further dimensionality reduction,then K-means is used for the low-dimensional data.Practice has proved that: the black box variational inference improves the variability and versatility of the model in the large sample size and high dimension,which makes the dimension reduction effect better.Secondly,t-distributed stochastic neighbor embedding is the most likely to maintain the neighborhood probability distribution characteristics of high-dimensional data consistent with that of low-dimensional data.After data is reduced to the intermediate dimension,the distance of t-distributed stochastic neighbor embedding is mapping farther data into lowdimensional space avoids data point aggregation,maximizing consistency between the final result and the intermediate dimension space.Compared with traditional PCA dimensionality reduction method,the method in this paper can extract features of the data more effectively,the dispersion between classes is improved in the cluster analysis,and it has better effect of clustering.Finally,numerical examples are given to illustrate the effectiveness of the proposed algorithm.

Keywords/Search Tags:

Variational auto-encoder, t-distributed stochastic neighbor embedding, Mini-batch gradient descent, K-means, Dimensionality reduction

PDF Full Text Request

Related items

1	The Study Of Correlation Analysis And Dimensionality Reduction Methods And Their Applications
2	Image Encryption Based On Variational Auto-encoder Genertive Models
3	Several Algorithms Based On Nonlinear Dimensionality Reduction Of Facial Expression Recognition Research
4	The Reseach And Application Of Stochastic Gradient Descent And Dual Coordinate Descent Algorithm
5	A Research And Application On Stochastic Gradient Descent Algorithm In Distributed Cluster
6	Research On Distributed Stochastic Gradient Descent Algorithm
7	Research On Improving The Convergence Performance Of Stochastic Gradient Descent In Distributed Machine Learning
8	Nonlinear Dimensionality Reduction Based On Stochastic Initialization
9	Dynamic Regret Of Online Gradient Descent:Analyses And Applications
10	Optimization Algorithms Of Neural Networks Weights Based On Stochastic Gradient Descent