Font Size: a A A

Research On Dimensionality Reduction Algorithm Based On Reconstruction Information Preservation

Posted on:2018-03-31Degree:MasterType:Thesis
Country:ChinaCandidate:Q L MeiFull Text:PDF
GTID:2358330518968364Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the development of Internet and storage,a large amount of high-dimensional data has emerged.These massive high-dimensional data have more abundant information,while at the same time some problems appear.The growth of high-dimensional data results in curse of dimensionality and challenges the data analysis.How to characterize high-dimensional data effectively and extract valuable information form the data is a core problem to be solved.Dimensionality reduction is a useful technique and widely applied in face recognition,image retrieve,and bioinformatics.In recent decades,with the development of dimensionality reduction technique,we have enhanced our demand for dimensionality reduction algorithm.The performance of dimensionality reduction algorithm is related to the extraction of information and the accuracy of data analysis directly.This thesis aims to enhance the accuracy of data,which is represented in low-dimensional space through dimensionality reduction algorithm.Then two dimensionality reduction algorithms are proposed,and improve the classification accuracy in UCI dataset and tumor dataset respectively.The main work and innovation of this paper are summarized as follows:1.A neighborhood preserving embedding algorithm based on global distance and class information is proposed.The method considers the global factor of the characterization of the global distance and the function of representing the label information in the traditional Euclidean distance formula of the adjacent graph,improves the quality of neighborhood,and constructs an optimal adjacency graph,so as to improve the classification accuracy.Experimental results demonstrate that the proposed method performs well.2.To reduce the dimensionality of high dimensional gene expression data and improve the separability of tumor data,at the same time analysis the limitation of sparse representation and neighbor representation and the peculiarity of classification of tumor data,this thesis proposes a discriminant hybrid structure preserving projections for tumor classification.DHSPP utilizes hybrid representation to efficiently characterize the structure of gene expression data,where both neighbor representation and sparse representation are taken into account.Specifically,DHSPP enhances the data separability after dimensionality reduction by simultaneously minimizing the within-class distance and maximizing the between-class distance.Moreover,it employs an imbalanced adjustment factor during the extraction process to overcome the class imbalance problem in tumor datasets.Experiments on tumor datasets demonstrate the effectiveness of the proposed DHSPP.
Keywords/Search Tags:Dimension Reduction, Feature Extraction, Feature Selection, Global distance, Label Information, Neighbor Presentation, Sparse Presentation
PDF Full Text Request
Related items