Font Size: a A A

Supervised And Unsupervised Feature Selection Based On Sparse Canonical Correlation Analysis And Its Application

Posted on:2021-09-19Degree:MasterType:Thesis
Country:ChinaCandidate:Y S LuFull Text:PDF
GTID:2518306470491624Subject:Mathematics
Abstract/Summary:PDF Full Text Request
With the rapid development of computer technology,all walks of life are producing huge amounts of data all the time.The emergence of high-dimensional data on the one hand brings great challenges to data-driven modeling,they will not only consume more computing time,occupy more storage resources,and even cut down the performance of model learning.On the other hand,redundant and irrelevant features in high-dimensional data seriously affect the further study of specific learning tasks.Feature selection,as a way of dimensionality reduction,it aims at selecting a representative compact feature subset from the existing data features to describe the original data and maintain the essential characteristics of the original data.By combining CCA and sparse representation,this paper proposes some improvements on the basis of existing work to overcome the shortcomings and limitations of the existing model,so as to realize the supervised feature selection and unsupervised feature selection of high-dimensional data.The main research contents and works of this paper include the following three aspects:(1)An adaptive sparse supervised CCA model is established.In the existing sparse supervised CCA model solving problem,in order to solve the problem easily,sparse supervised CCA usually sacrifices the optimization objective from the combination of correlation coefficients for canonical variables to the covariance combination,which will lead to a large deviation in feature selection.To solve this problem,a new feature selection model,ASSCCA,is proposed by introducing a group of adaptive weight coefficients.In this paper,the 5-fold cross-validation experiment is carried out for the simulation data set,including visual comparison and evaluation of the feature selection performance of the ASSCCA,and comparison of the correlation coefficients of the ASSCCA.The experimental results show that under the condition of high correlation with the supervised data,the feature selection performance of ASSCCA is better.(2)The CCA model of l0 norm of Gaussian approximation is established.For unsupervised CCA,aiming at the problem of good sparse performance but high solving complexity of penalty term l0-norm,this paper punishes CCA with continuous,piecewise smooth and Gauss approximation l0-norm with sparse performance close to l0-norm as sparse regularization term,and proposes Gauss approximation l0-norm CCA model,which is called GACCA.In addition,a five-fold cross-validation experiment is carried out for four sets of simulation data sets,including visual comparison and evaluation of GACCA feature selection performance,and comparison of the correlation coefficient obtained by GACCA.The experimental results show that GACCA can get a large correlation coefficient while taking into account the accuracy of feature selection.(3)Data set of schizophrenia is studied based on GACCA.The experimental results show that GACCA not only selects more brain regions and genes that have potential connection with schizophrenia,but also gets better enrichment analysis results,which provides auxiliary information for clinical diagnosis or further theoretical research of schizophrenia.
Keywords/Search Tags:Feature selection, canonical correlation analysis, gauss approximation l0-norm, supervised learning, schizophrenia
PDF Full Text Request
Related items