Font Size: a A A

Image Genetics Data Association Analysis Based On Structured Sparse

Posted on:2019-07-24Degree:MasterType:Thesis
Country:ChinaCandidate:D H KanFull Text:PDF
GTID:2404330563996015Subject:Control theory and control engineering
Abstract/Summary:PDF Full Text Request
Research on schizophrenia has always been an important part of brain science research,and has been widespread concern around the world in recent years.Many studies have focused on exploring the relationship between genetic variants and brain imaging to aid the clinical diagnosis and treatment of schizophrenia through statistical analysis,data mining and machine learning.However,it is still challenging to find critical genes and abnormal brain regions associated with the schizophrenia from a large number of genes and brain imaging.Therefore,it is very important to find correlation analysis methods for large-scale data.In this paper,a structure-constrained sparse canonical correlation analysis algorithm is used to study the association between a large number of single nucleotide polymorphism and functional magnetic resonance imaging,finding biomarkers associated with the schizophrenia.According to the high-dimensional features of small samples of genetic and brain imaging data,most researchers use the dimensionality reduction method in the first step,and then perform correlation analysis.However,this will not only lose useful information,but also overfitting phenomenon caused by high-dimensional data still exists.In order to improve the overfitting,regularization based sparse representation method is used.By multiplying the high-dimensional matrix by a sparse vector,most of the elements are set to zero and the main salient features are preserved.On the one hand,considering the linkage disequilibrium(LD)in the genome and the spatial structure information of each brain region,data driven feature network structure is used as a priori to guide fused lasso for feature selection.On the other hand,since neuroimaging data and genetic data are not strictly subject to Gaussian distribution,it is difficult to find truly meaningful information by considering only secondorder statistics.Therefore,while maximizing the correlation between the two types of data,this paper uses the higher-order statistics of negative entropy to select statistically independent variables from each type of data.Finally,the alternating least squares method is used to solve the non-convex optimization model.The cross-validation experiments were performed on simulated and real imaging genetic datasets.The experimental results on simulated data show that the effectiveness of our algorithm is obviously better than the other two widely used sparse algorithms;the experimental results on real data show that the algorithm can solve the feature selection of ultra-high-dimensional data in an effective time.In addition,it effectively discovers the genes and brain regions potentially associated with schizophrenia and provides help for the study of mental illness.
Keywords/Search Tags:Image genetics, correlation analysis, feature selection, sparse representation, structural constraint
PDF Full Text Request
Related items