Font Size: a A A

Dimensionality Reduction Technique For Visualization In Wasserstein Space

Posted on:2022-05-04Degree:MasterType:Thesis
Country:ChinaCandidate:L ChenFull Text:PDF
GTID:2518306575466524Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Today,high-dimensional data is frequently found in fields such as statistics,data science,machine learning,life sciences,and business affairs.However,complex matters in these fields need to deal with data that often have tens or hundreds of dimensions of features,such as remote sensing data from satellites,pathology statistics,complex chemical compositions,etc.Although the development of information technology has made it possible to acquire and store detailed features of things,the huge amount of information also poses difficulties for data analysis and pattern mining.In order to analyze and obtain the required information from them,the visualization of highdimensional data has become an important branch in the field of information visualization.This thesis presents a new dimensionality reduction method for visualization that is particularly suitable for clustering tasks on high-dimensional data and aims to provide users with intuitive and distinguishable two-dimensional projections.The main contributions of this thesis lie in the following two aspects.Firstly,the method uses Wasserstein distance instead of the KL divergence that used in the original loss function of t-SNE to better match the different probability measures in high and low dimensional spaces,which leads to a projection with stronger clustering effect.Secondly,this method introduces the biharmonic distance and calculates the distance between high-dimensional data based on manifold learning.The biharmonic distance is more robust to noise and wrongly connected edge,while being able to preserve the topology of the original data.The visualization results and quantitative metrics results are compared with the mainstream methods such as t-SNE and UMAP on several datasets.The comparison of visualization results proves that the proposed method has good visual effect on cluster.Then,the results of quantitative metrics show that due to the effect of Wasserstein distance and the biharmonic distance,the ability to maintain local details of proposed method is slightly lower than t-SNE and UMAP.But the proposed method but can better maintain the global structure.Finally,this thesis illustrates experimentally and analytically how to select and adjust the hyperparameters involved in this method to help users obtain better projection results.
Keywords/Search Tags:high-dimensional data visualization, dimensionality reduction, Wasserstein distance, optimal mass transport, biharmonic distance
PDF Full Text Request
Related items