Research On Dimensionality Reduction And Clustering Methods For High-dimensional Data Based On Metric Learning

Posted on:2024-07-16

Degree:Master

Type:Thesis

Country:China

Candidate:L H Chen

Full Text:PDF

GTID:2568307067478124

Subject:Statistics

Abstract/Summary:

PDF Full Text Request

With the continuous progress of data collection technology,human beings have ac-quired more and more high-dimensional data.These data can provide more comprehensive and rich information,but they also bring problems such as”dimensional disaster”,which makes it difficult for traditional data processing methods to effectively reveal the informa-tion inherent in the data.Therefore,in the fields of machine learning,image processing,data mining and computer vision,how to design efficient methods for analyzing and pro-cessing high-dimensional data has become an important topic of current research.To this end,this paper investigates the methods of dimensionality reduction and clustering of high-dimensional data based on distance metric learning,and the specific research work is as follows:Firstly,we propose a semi-supervised dimensionality reduction method based on7)_2,-norm distance.The method address the problem of the lack of reasonable feedback from the low-dimensional representation on the construction of the distance metric matrix in the dimensionality reduction method based on distance metric learning.The method de-signs the low-dimensional representation and the distance metric by interaction terms to achieve the joint learning of the dimensionality reduction matrix and the distance metric matrix.Meanwhile,the proposed algorithm is further extended with kernels in order to cope with more complex nonlinear data.The experimental results show that the proposed method exhibits effectiveness and robustness in KNN classification,which is significantly improved compared with other classical distance metric learning and dimensionality re-duction methods.In addition,to address the problem that the effectiveness of the traditional distance-based clustering algorithm decreases in the distance metric in high-dimensional space,we propose a local affine hull distance-based clustering algorithm,which aims to improve the way of clustering distance metric in high-dimensional space.Specifically,we divides the high-dimensional sample space into multiple local affine hull and uses the distance between unknown samples and affine hull to obtain the similarity between samples.Meanwhile,the concept of uncorrelated subspaces is introduced to incorporate the idea of discriminant analysis into the clustering framework.In this framework,clustering generates class labels for the affine model,while the affine model provides subspaces for clustering.To test the effectiveness of the proposed method,comparative experiments with some existing clustering methods are conducted in this paper.The experimental results show that the clustering algorithm based on local affine hull distance has better clustering effect in high-dimensional space and can better solve the distance metric problem in high-dimensional dataset.In order to further optimize the clustering algorithm based on local affine hu ll dis-tance,we propose a clustering algorithm based on hyperdisk distance.The algorithm uses the hyperdisk generated by affine hu ll an d hy persphere to st rictly co nstrain th e position of samples in the subspace to achieve a more compact approximation of class regions.Specifically,w e u se t he h yperdisk a s a l ocal a pproximation o f t he s ample a nd redefine the distance metric in this way to achieve a more efficient ex ecution of th e clustering task under the subspace.Finally,in this paper,the proposed algorithm is experimentally compared with some existing clustering methods.The experimental results show that the clustering algorithm based on the hyperdisk distance exhibits high performance in terms of clustering accuracy.

Keywords/Search Tags:

High dimensional data, Distance metric learning, Subspace clustering, Dimensionality reduction

PDF Full Text Request

Related items

1	Research And Application Of Density Clustering Algorithm Based On Kernel Principal Component And High Dimensional Distance
2	Research On Asymptotic Theory And Methods For Clustering Ultra-high Dimensional Data
3	Dimensionality Reduction And Classification Of High-dimensional Data Using Cosine Metric
4	Research Of Classification Method On High-Dimensional Image Data
5	Research On Clustering Algorithm Of High Dimensional Data And Its Distance Metric
6	Globality And Locality Incorporation In Distance Metric Learning
7	A Perception-Driven Approach To Supervised Dimensionality Reduction For Visualization
8	Dimensionality Reduction Technique For Visualization In Wasserstein Space
9	Some Researches On Semantic-based 3D Model Retrieval Techniques
10	Research Of Dimensionality Reduction And Clustering Based On Constraint Weight Learning And Dictionary Learning