Font Size: a A A

Research On Dimensionality Reduction Of High-dimensional Data

Posted on:2019-02-21Degree:MasterType:Thesis
Country:ChinaCandidate:L DingFull Text:PDF
GTID:2428330545483607Subject:Department of Automation Control Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology,the data we acquire,store,and need to process begins to grow exponentially.These data are not only large in quantity but also update quickly,and they often contain many inherent laws that are difficult to observe directly.How to effectively extract the required information from these high-dimensional massive data and find its inherent laws has always been a fundamental issue in the field of machine learning,and dimension reduction has become one of the effective methods to solve such problems.In addition,reducing the dimensions of high-dimensional data can remove noise and irrelevant attributes in high-dimensional space and reduce the space needed for data storage,which can effectively avoid dimension disasters and improve the performance and efficiency of subsequent learning algorithms.The two most classical methods of dimension reduction are principal component analysis(PCA)and linear discriminant analysis(LDA).PCA is an unsupervised learning dimension reduction method whose purpose is to find the projection direction that maximizes sample covariance.LDA is a kind of dimensionality reduction method for supervised learning.Its purpose is to find the projection direction that makes the distribution of samples of the same class is compact and the samples of different class can be highly distinguishable.However,in the process of extracting data features,there is a model assumption of global normal distribution.When the actual sample does not match the distribution assumption,its performance will be greatly affected.In recent years,many linear discriminant analysis algorithms based on manifold learning have been proposed,but they usually use a fixed parameter model(such as a Gaussian function)to describe the internal geometry of the data.The complexity of the data distribution makes the fixed parameter model not the optimal description of the essential structure of the data.Therefore,this paper proposes several improved algorithms that can extract the inherent characteristics of high-dimensional data more quickly and efficiently,and apply these methods to face recognition and other practicalproblems.The main research results of this paper are:1.Aiming at the lack of robustness of most LDA-based algorithms,a dynamic weighted nonparametric discriminant analysis(DWNDA)is proposed.By introducing an dynamic weighted distance metric,this algorithm makes the distance calculation between sample points adjust dynamically with the distribution characteristics of sample patterns.The DWNDA algorithm takes into account the complex distribution modalities of the sample patterns of the same class in calculating the within-class scatter,especially considering the differences in the distribution characteristics among the various modalities,so it is easy to extract intrinsic geometric features of samples of the same class pattern.In the calculation of the between-class scatter,DWNDA highlights the impact of edge sample pairs,while using the statistical characteristics of different sample points to reduce the impact of noise samples.2.In order to solve the problem of poor robustness and hyperparameter setting in some algorithms based on graph embedding such as local sensitivity discriminant analysis(SLSDA),this paper presents a normalization locality sensitive discriminant analysis(NLSDA).By normalizing the border weights,the algorithm constructs intra-class similarity,intra-class diversity,and inter-class diversity adjacency graphs,thereby reducing the influence of noise samples and improving the robustness of the algorithm.At the same time,NLSDA can learn the local topological structure of the data well without setting the neighborhood parameters,and solves the problem of hyperparameter setting in many graph embedding algorithms.3.In order to solve the problem of local structural damage and poor robustness of the traditional euclidean distance measurement algorithm,this paper proposes adaptive locality sensitive discriminant analysis(ALSDA).Based on the NLSDA proposed in this paper,an adaptive norm is introduced to measure the distance between sample points in the embedding space.ALSDA not only preserves the advantages of NLSDA,but also improves the ability to preserve the local topology of the data and enhances robustness to noise and abnormal samples.In the end,this paper compares the proposed algorithms with a series of classic dimension reduction methods by performing experiments on multiple face databases and handwriting recognition.The experimental results prove the effectiveness of the proposed algorithms.
Keywords/Search Tags:Dimensionality reduction, Graph embedding, Adaptive norm
PDF Full Text Request
Related items