Research On Clustering Algorithms For High Dimensional Nonlinear Data

Posted on:2020-06-06

Degree:Master

Type:Thesis

Country:China

Candidate:J Li

Full Text:PDF

GTID:2428330578950922

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Clustering is an important data mining technology,which can divide the acquired data into different categories according to certain constraints.The main research objectives of clustering are the similarity between data points in the same cluster and the dissimilarity between data points between different clusters.Since existing data generally has high dimensionality and non-linearity,clustering for high-dimensional nonlinear data has become an important research topic in the field of data mining.Based on the in-depth analysis of high-dimensional nonlinear data,this paper proposes a new feature extraction algorithm for high-dimensional nonlinear data based on the shortcomings of traditional dimensionality reduction algorithms.Based on this,a weighted manifold distance is proposed.Nonlinear data clustering algorithm.The main results achieved in this paper are as follows:Aiming at the poor generalization ability of traditional dimensionality reduction algorithms,many algorithms need experience guidance,and can not deal with nonlinear incremental data.This paper proposes a similar metric-symmetric uncertainty SU by means of information theory related theory.A SU-based feature extraction algorithm(RFE-SU).The algorithm solves the shortcomings of the traditional principal component analysis algorithm that the correlation coefficient can't measure the nonlinear relationship between data.On this basis,the SU-based feature extraction algorithm is improved by sliding window technology and multi-level linkage buffer mechanism.It can be applied to the dimensionality reduction of incremental data.Based on the dimensionality reduction of high-dimensional non-linear data by RFE-SU algorithm,this paper proposes a weighted manifold distance based non-linear data clustering algorithm(WMD-NLData)to cluster non-linear data on the basis of information theory and manifold learning.This paper designs an adaptive weight calculation method based on information entropy from the global point of view,which makes the distance calculation in clustering process closer to theobjective situation.To reduce the running time of the algorithm and improve the performance of the algorithm.Efficiency,this paper also analyzes the data scale reduction;and through the manifold learning related theory combined with the adaptive weight,a new distance metric�weighted manifold distance is designed.The new metric can accurately describe the internal manifold structure of nonlinear data.Based on this metric,a clustering algorithm for nonlinear data is proposed.A lot of experiments show that the RFE-SU algorithm proposed in this paper can quickly reduce the dimensionality of nonlinear data,and it also saves most of the original data while improving the efficiency of the algorithm.The WMD-NLData algorithm proposed in this paper can be targeted.High-dimensional nonlinear data is clustered.Both the artificial and the real data have achieved good results,and the accuracy and efficiency are greatly improved compared with the traditional algorithms.

Keywords/Search Tags:

Clustering, Symmetric Uncertainty, Principal Component Analysis, Weighted Manifold Distance, Nearest Neighbor Density

PDF Full Text Request

Related items

1	Manifold Density Peak Clustering Algorithm And Its Application Of Weibo Text Classification
2	The Research And Application Of Density Peaks Clustering
3	Research On Density-based Hierarchical Clustering Algorithm
4	Research On K-nearest Neighbor And Group K-nearest Neighbor Query For Moving Objects In Obstructed Spaces
5	Research On Affinity Propagation Clustering Algorithm
6	Study On Facial Expression Recognition Based On Manifold Learning
7	Research And Application Of Financial Big Data Based On Density Peak Clustering Of K Near Neighbors
8	Study On Generalized Nearest Neighbor Pattern Classification
9	A Weighted Kernel PCA And The Related Parameters Choice
10	Learning Structure Features In High Dimensional Data Based On Natural Nearest Neighbor