Font Size: a A A

Gene Expression Pattern Recognition And Classification Of Alzheimer's Disease Based On Deep Learning

Posted on:2019-10-22Degree:MasterType:Thesis
Country:ChinaCandidate:Z L TongFull Text:PDF
GTID:2404330596460947Subject:Biomedical engineering
Abstract/Summary:PDF Full Text Request
Alzheimer's disease(AD)is a progressive neurodegenerative disease with high incidence in the elderly.As the pathogenesis is unclear and disease progression is irreversible,AD has become a huge threat to human health.Early diagnosis and early treatment can significantly prolong the survival of patients with Alzheimer's disease.At present,the classification of AD and prediction of MCI converters mainly focus on neuroimaging and biochemical markers.Gene expression in peripheral blood has potential application value for early diagnosis of AD as it can reflect the physiological state and disease development status in time.However,because of the challenges in data acquisition and analysis,gene expression data has not been effectively applied in AD diagnosis.With successful application in many fields,deep learning algorithms are proved to be a powerful tool for the analysis of gene expression data and the identification of AD-related features.In this study,the gene expression characteristics of Alzheimer's disease are extracted by a stacked-denoise autoencoder(SDAE)to help AD classification and MCI transformation prediction.In this paper,deep learning is applied to the analysis of gene expression data.First,a 3-layer stacking denoising autoencoder is constructed to extract the characteristics of AD gene expression based on ADNI dataset.By 10-fold cross-validation experiments,the parmaters of SDAE are determined.The number of hidden layer nodes of the 3 hidden layers SDAE is 5000,500,and 50,and the corruption level per layer is 0.1,0.2,and 0.1,respectively.Using the features generated from SDAE,an SVM classifier is constructed to classify 246 normal control and 498 AD&MCI samples.Indexes of accuracy rate,precision rate,recall rate,and AUC show that the characteristics of SDAE are significantly better for the classification of AD than the raw gene expression data,PCA and differential expression analysis.Integrating all the three-layer features of SDAE,the classification accuracy reaches 100% in 10-fold cross validation.Secondly,the SDAE features are further analyzed to find out the features that greatly contribute to the classification so as to reduce the feature dimensions.By modified SVM–RFE feature selection method,43 high contribution nodes are selected from the first 5000 features of SDAE.ROC curve shows that the 43 high contribution nodes have only a slight decrease in classification performance;at the same time,the classification performance of non-high contribution nodes has greatly decreased,which proves the effectiveness of high contribution nodes.Subsequently,5437 high-weight probes are extracted from the 43 high-contribution nodes.KEGG pathway enrichment analysis shows that the high-weight probes are significantly enriched in Alzheimer's disease,Parkinson's disease,and Huntington's disease.The pathway clustering result shows that non-alcoholic fatty liver has strong correlation with these three pathways,implying that they share a similar molecular mechanism.Then,characterization nodes are constructed by high weight probes,and outperform high weight probes in the classification of AD.On the GSE6613 dataset,the performance of the characterization nodes is verified.The classification performance is significantly stronger than the original probe data,PCA and differential expression analysis,which further implys that the gene expression feature extracted by SDAE is more effective in AD classification.Finally,an MCI transformation prediction model is constructed based on 80 MCI converters and 271 MCI non-converters in the ADNI database.Compared with the original probe values,PCA and differential expression analysis,the SDAE model can significantly improve the prediction.Combining the three SDAE features,the classification accuracy is 0.8577,the precision is 0.8720,the recall is 0.9240,and the AUC is 0.91.Feature selection is performed on the first 5000 nodes of SDAE,and 338 high weight probes are used to generate 52 new characterization nodes.The classification performance of the characterization nodes is greatly reduced,with the accuracy of only 0.7746.However,compared with PCA and differential expression analysis,characterization nodes still have advantages for MCI transformation prediction.Based on the gene expression feature extracted by stacked denoising autoencoders,this paper constructs well-performed AD classification and MCI transformation prediction model.The result indicates the superiority of SDAE in feature extraction of gene expression,which is also of great significance for the integration of more biomarkers to assist early diagnosis of AD.
Keywords/Search Tags:Alzheimer's disease, Mild cognitive impairment, gene expression data, autoencoder, support vector machine
PDF Full Text Request
Related items