Font Size: a A A

Classification And Diagnosis Of Alzheimer's Disease By L1 Regular-Logistic Regression, L1 Regular-Support Vector Machine And Gradient Boosting Decison Tree

Posted on:2021-05-18Degree:MasterType:Thesis
Country:ChinaCandidate:C LiFull Text:PDF
GTID:2404330623975499Subject:Neurobiology
Abstract/Summary:PDF Full Text Request
Objective:Because of the diversity of Alzheimer's Disease and the heterogeneity of individuals,and the only way to diagnose Alzheimer's Disease is autopsy,there are difficulties in clinical classification diagnosis.Neuroimaging information is playing an increasingly important role in AD classification.This paper presents a technique based on the automatic classification of cerebral cortex,hippocampus and basal nuclei.This technology applies the Machine Learning method to realize the classification diagnosis of AD,Mild Cognitive Impairment and Normal Cognitive elderly.Two specific goals are achieved:1.To find an appropriate feature selection method,so that the selected feature values can be used as an important basis for auxiliary diagnosis,so as to make clinical diagnosis more efficient.2.Select appropriate machine learning model,optimize and correct the model,train the classifier suitable for clinical classification diagnosis,and improve the accuracy of clinical diagnosis.Methods:1.Data of 543 subjects eligible for Alzheimer's Disease Neuroimaging Initiative?ANDI?were randomly selected,including subject ID number,Structural Magnetic reso-nance Imaging?sMRI?image,Mini Mental State Examination?MMSE?,age,gender,and education duration.According to the diagnostic criteria of AD,the subjects were divided into Normal Cognitive?NC?,Early Mild Cognitive Impairment?EMCI?,Late Mild Cognitive Impairment?LMCI?,AD,and the number of patients was139,220,108,and 76 respectively.2.Feature Extraction Freesurfer software was used for feature extraction.After preprocessing of sMRI images,272 items of data were extracted.The extracted data covered cortical surface area,subcortical volume,hippocampal subarea volume,cortical volume,and cortical thickness,and were 70,49,16,69,and 68,respectively.3.feature selection It is divided into two parts:In the first part,we used L1 regular-Logistic Regression?L1-LR?,L1-regular Support Vector Machine?L1-SVM?and Gradient Boosting Decison Tree?GBDT?feature selection model to select the features of272 brain imaging data,and obtained different feature selection data indicators.Part 2:276items of data,namely 272 items of brain imaging data plus 3 demographic indicators?age,gender,education time?and MMSE score were selected by the above three feature selection models.4.Construct classification model and evaluate classifier The two parts of data selected from the features in step 3 were respectively entered into three machine learning algorithms?L1-LR,L1-SVM,and GBDT?to train the classification model for accurately identifying normal cognitive subjects and patients with different degrees of AD.A 10-fold cross-validation strategy was used to evaluate the scheme.Results:1.Feature selection results:?1?In the L1-LR feature selection model,among the 272 indicators,65,37,22,52,38and 41 indicators were selected between the two groups of NC-EMCI?NC-LMCI?NC-AD?EMCI-LMCI?EMCI-AD and LMCI-AD respectively.Among the 276indicators,67,42,11,56,21 and 20 indicators were selected.In the NC-EMCI group,MMSE score,age,gender and education time ranked 3rd,9th,22nd and 39th.In the NC-LMCI group,MMSE score,age and education time ranked 3rd,5th and 23rd.In the NC-AD group,MMSE score,gender and education time ranked 1st,4th and 7th.In the EMCI-LMCI group,MMSE score,age and education duration were ranked as 5,8 and 30.In the EMCI-AD group,MMSE score,age and education duration ranked 3rd,6th and 17th.In the LMCI-AD group,MMSE score,age and education duration ranked 1st,3rd and 10th.?2?In the L1-SVM feature selection model,among the 272 indicators,133,86,58,112,78 and 78 indicators were selected between the groups of NC-EMCI?NC-LMCI?NC-AD?EMCI-LMCI?EMCI-AD and LMCI-AD,respectively.Among the 276indicators,121 were selected,82 were selected,22 were selected,113 were selected,39were selected and 53 were selected.In the NC-EMCI group,MMSE score,age,gender and education time ranked 4th,14th,31st and 75th.In the NC-LMCI group,MMSE score,age and education duration ranked6th,8th and 46th.In the NC-AD group,MMSE score and gender ranked 1st and 10th:In the EMCI-LMCI group,MMSE score,age,gender and education duration were ranked as5,10,28 and 64.In the EMCI-AD group,MMSE score,age and education duration ranked3rd,7th and 23rd.In the LMCI-AD group,MMSE score,age and gender ranked 2nd,7th and 14th.?3?In the feature selection model of GBDT,among the 272 indicators,80 indicators were selected between the groups of NC-EMCI?NC-LMCI?NC-AD?EMCI-LMCI?EMCI-AD and LMCI-AD.Among the 276 indicators,80 indicators were selected.In the NC-EMCI group,age,MMSE score,gender and education time ranked 1st,3rd,16th and 25th.In the NC-LMCI group,age and MMSE score ranked 20th and 23rd.In the NC-AD group,education time ranked no.32;In EMCI-LMCI group,the age was ranked71.In EMCI-AD group,MMSE score ranked first.In the LMCI-AD group,MMSE score,age and education duration ranked 1st,20th and 73rd.2.Classification prediction results:?1?When the feature selection model and the classification prediction model are of the same category,the prediction effect is better than the combination of different models.?2?In 272 brain imaging data,the overall prediction accuracy and stability of L1-LR were higher than L1-SVM and GBDT,and the effect was better after 10-fold cross-validation.?3?Among 276 data items,the prediction accuracy of the three models has been improved to varying degrees.The accuracy range of L1-LR model is 82.93%97.66%,the sensitivity range is 58.27%95.25%,and the specificity range is 86.26%100.00%.The accuracy range of L1-SVM classification prediction is 58.27%95.25%,and the specificity range is 86.26%100.00%.GBDT classification accuracy ranges from 82.71%to 97.26%,sensitivity ranges from 41.45%to 100.00%,and specificity ranges from47.06%to 96.53%.Conclusion:1.276 items of data as input characteristics of the classifier can improve the classification performance of the classifier and achieve higher accuracy.The 276 items included 272 items of brain imaging data based on cortical,hippocampal volume and basal ganglia,and 4 items of age+sex+education time+MMSE score.2.L1-LR model and L1-SVM model based on 276 items of data show high accuracy in identifying different groups,which can be used as an auxiliary tool for clinical classification diagnosis.3.The features selected from L1-LR model and L1-SVM model based on 276 items of data are of clinical significance and strong reliability,and can be used as the focus of attention and monitoring objects in the identification of the two groups.In L1-LR model,cortical characteristics were ranked by importance:Cortical Thicknesses>Surface Areas>Cortical Volumes>Subcortical Volumes;In the L1-SVM model,cortical features were ranked according to importance:Cortical Thicknesses>Cortical Volumes>Surface Areas>Subcortical Volumes,and meanwhile,a large number of indicators in the hippocampal subregion were involved in classification.4.The accuracy of L1-LR model and L1-SVM model is consistent,and the accuracy of each group is from high to low:NC-AD?EMCI-AD?NC-LMCI?LMCI-AD?NC-EMCI?EMCI-LMCI.
Keywords/Search Tags:Alzheimer's Disease, Mild Cognitive Impairment, Machine Learning, L1-regularized Logistic Regression, L1-regularized Support Vector Machine
PDF Full Text Request
Related items