Font Size: a A A

Research And Its Application Of Feature Selection Based On Group Sparse Canonical Correlation Analysis

Posted on:2021-04-03Degree:MasterType:Thesis
Country:ChinaCandidate:Y L ZhaoFull Text:PDF
GTID:2518306470489774Subject:Mathematics
Abstract/Summary:PDF Full Text Request
The processing of multi-modal data has always been a hot issue in machine learning field.A lot of multi-modal data is often accumulated in many fields.The multi-modal data often have a high dimension and a lot of redundant information,so directly processing of multi-modal data is easy to cause dimension disaster.In view of this difficulty,some scholars have proposed feature selection.In the field of feature selection,sparse canonical correlation analysis plays an extremely important role,but this method can't effectively select features in the data with group structure information.Therefore,for this limitation,the group norm is used as the regularization term to punish the canonical correlation analysis to improve the effectiveness of feature selection using group information.The feature selection model is established and applied to the schizophrenia data set.The specific research work is as follows:(1)The l1,2norm with intra-group sparsity is often applied to the case where the prior group structure information is known.However,it is often difficult to obtain priori group information in real life,which greatly limits the application range of feature selection.While the random grouping can make important feature be randomly divided into each group,which further expands the application range of l1,2norm.Therefore,two modal data where the prior group structure information exists but is unknown is used to construct a new group by random grouping to apply l1,2norm in this paper,a new random grouping sparse canonical correlation analysis model based on l1,2norm is proposed,it is called ERGSCCA.This paper constructs simulation data and makes relevant experiments,including analyzing the correlation coefficients and typical variables obtained from the experiments on the training set and test set and evaluating the feature selection performance based on the improved model in a visual form.The simulated experimental results illustrate that the ability of feature selection of ERGSCCA is stronger than that of S2CCA.(2)When some important features in the data set exist intra-group and inter-group,the l1,2norm with only intra-group sparseness is restricted,but l2,1 norm can realize inter-group sparseness.Therefore,by adding penalty item l2,1norm to ERGSCCA,a canonical correlation analysis model based on the combination of l1,2norm and l2,1norm is constructed,which is called EGSCCA.The simulated experimental results illustrate that EGSCCA's ability with feature selection is stronger than that of ERGSCCA's and sparse group lasso's.(3)Research on schizophrenia based on ERGSCCA.Based on the obtained schizophrenic data set,we do not know the prior group structure information,so we use ERGSCCA.The main idea of this method is that ERGSCCA directly acts on the schizophrenic data set with two-modal and high-dimensional to obtain the susceptibility genes and risk brain areas related to schizophrenia.From the perspective of statistical methods?gene function enrichment analysis and other evaluation indicators,the new method has better ability of feature selection than other models.
Keywords/Search Tags:l1,2 norm, l2,1 norm, random grouping, canonical correlation analysis, schizophrenia
PDF Full Text Request
Related items