Font Size: a A A

Improved Algorithm Based On Canopy's High-dimensional Sample Similarity Measurement And Group Weighted T-SNE

Posted on:2020-02-03Degree:MasterType:Thesis
Country:ChinaCandidate:F DuFull Text:PDF
GTID:2438330596497498Subject:Instrumentation engineering
Abstract/Summary:PDF Full Text Request
With the increase of data scale and data complexity,the technology for processing high-dimensional data is becoming more and more important.Especially,the dimensionality reduction method and technology is of great significance for classification,clustering,timing prediction and correlation analyzing.The techniques of high dimension reduction is generally divided into linear dimension reduction and nonlinear dimension reduction.t-distributed stochastic neighbor embedding(t-SNE)algorithm is a kind of nonlinear dimension reduction method,which is widely used in many fields.The principle of t-SNE is to calculate the joint probability between sample pairs in high-dimensional space and then match it to the joint probability of sample pairs in low-dimensional space,which will realize the mapping of data in high-dimensional space to the data in low-dimensional space.In current research,the actual distribution of samples in high-dimensional space is always not to be considered when implementing the algorithm.That is to say,no matter what kind of similarity the samples are in the high-dimensional space,only one same probability calculating algorithm is adopted for all the data,obviusly the differentiation of low-dimensional mapping results will be affected and the clustering results will not as good as expected.In response to this question,A grouped weighted t-SNE algorithm based on Canopy(Canopy-tsne)is proposed in this paper.Canopy algorithm is used to analyze the similarity degree of high-dimensional samples in high-dimensional space.On this basis,the similarity degrees of high-dimensional space samples are divided into three cases: high similarity degree,moderate similarity degree and low similarity degree.Adaptive weighted values are added when calculating the joint probability between every two sample points in high dimensional space,so as to get more accurately similarity degree,by doing this we can achieve better dimension reduction effection.In order to verify the dimensional reduction effect of Canopy-tsne algorithm,experiments were carried out in two application fields.The experiments of Canopy-tsne algorithm used for supervised digital handwritten singular samplesdimension reduction show that compared with common t-SNE and other algorithm,Canopy-tsne algorithm effectively eliminates the problems of incomplete clusters and sample crossover.After dimension reduction,the performance indexes demonstrate that the recall and precision are significantly improved compared with common t-sne algorithm.The experimental results of Canopy-tsne algorithm used in unsupervised brain network state observation matrixes show that compared with common t-SNE and other algorithm,Canopy-tsne algorithm effectively eliminates the crossover and scatter of brain network state in different time.The performance indicators after dimensionality reduction show that the Davies-Bouldin Index,Dunn Index and Silhouette Coefficient of brain network state clustering are significantly improved than common t-SNE algorithm.It can be seen that the method proposed in this paper can achieve better dimensionality reduction effections for both supervised dimensionality reduction classification problems and unsupervised dimensionality reduction clustering problems,which provide a meaningful solution for dimensionality reduction of high-dimensional data.
Keywords/Search Tags:Canopy-tsne algorithm, Grouping weighted algorithm, High-dimensional dimensionality reduction, Singular handwritten digits classification, brain network state clustering
PDF Full Text Request
Related items