Font Size: a A A

Multi-Classification Prediction Of Disease Progression In Diffuse Large B-Cell Lymphoma Based On Hierarchical Classification

Posted on:2022-06-17Degree:MasterType:Thesis
Country:ChinaCandidate:X Q HuangFull Text:PDF
GTID:2504306518975379Subject:Epidemiology and Health Statistics
Abstract/Summary:PDF Full Text Request
Objective:Diffuse large B-cell lymphoma(DLBCL)is the most common subtype of non-Hodgkin’s lymphoma.Currently,R-CHOP(rituximab + cyclophoshamide,doxorubicin,vincristine,and prednisone)regimen is the chemotherapy of choice for DLBCL,and the majority of patients can achieve a complete response.However,there are still some patients with disease progression during treatment or early recurrence after remission(remission period is less than one year),and after recurrence,the response rate of first-line standard treatment is low,the continuous action time of response is short,and finally converted to refractory DLBCL,which has become the main cause of death of DLBCL.In view of this problem,this study intends to build a multi-classification prediction model for the disease progression stage of DLBCL patients,so as to assist clinical workers in the diagnosis of disease progression stage and the reasonable choice of treatment regimen in the later consolidation treatment.Methods:1.Simulation research: After using SOMTE,Borderline-SMOTE,and ADASYN three types of balance methods to balance 5 public databases with different imbalance rates,using BP neural network,support vector machine,random forest 3 machine learning methods and their respective Adaboost integrates a total of 6 algorithms to construct a direct multi-classification model.Indexes such as area under the ROC curve,F value,and G-means are mostly suitable for two classification problems,so the classification accuracy is selected as the evaluation index for direct multi-classification.The hierarchical classification method is used to construct the indirect multi-classification model,the above 6 algorithms are used as the candidate base classifiers,and the hierarchical measurement method is applied to determine the accuracy,sensitivity,F value,area under the ROC curve(AUC)and G of each classifier.-means value is used as an evaluation index.After the optimal model at each level is selected,the hierarchical measurement method is used to calculate the hierarchical accuracy rate,which is compared with the classification accuracy rate of the direct multi-classification model.2.DLBCL example application: collect DLBCL patients’ general condition,pathological information,PET-CT/CT imaging data and treatment plan,etc.more than100 features,and apply 3 types based on single feature correlation ranking method,recursive feature elimination method and random forest After the feature selection method has screened out different feature subsets,the best-performing category balance method and multi-classification method in the simulation study are selected to construct a multi-class prediction model for the disease progression stage of DLBCL patients,and the feature subsets selected by the three feature selection methods are compared impact on the performance of multi-class models.Results:1.Simulation research:(1)balance database: in the direct classification method,the performance of support vector machine using Borderline-SMOTE for class balancing is the best(accuracy = 0.7440);the performance of Ada Boost integrated with support vector machine using ADASYN for class balancing is the best(accuracy = 0.7909);the performance of BP neural network using ADASYN for class balancing is the best(accuracy = 0.7740);the performance of BP neural network using ADASYN for class balancing is the best(accuracy = 0.7740);the performance of BP neural network using ADASYN for class balancing is the best ADASYN has the best performance(accuracy =0.7895),ADASYN has the best performance(accuracy=0.7572)and Borderline-SMOTE has the best performance(accuracy = 0.7595).It can be seen from the above that in the direct classification method of balance database,ADASYN algorithm is applied to the optimal modeling of four models,and Borderline-SMOTE algorithm is applied to two models.It can be concluded that ADASYN algorithm has better category balance performance,and the Ada Boost integration performance of support vector machine with ADASYN algorithm is the best among all models(accuracy = 0.7909).In the first level classification of hierarchical classification,the analysis process is the same as that of the above direct classification,and the best category balance algorithm is selected as borderline smote algorithm.In all models,the BP neural network Ada Boost algorithm with Borderline-SMOTE algorithm has the best performance(accuracy = 0.8788,sensitivity = 0.8323,f = 0.8620,AUC = 0.8749,g-means = 0.8)In the second level of classification,ADASYN algorithm is selected as the best one among the base classifiers,and the random forest performance of ADASYN algorithm in all models is the best(accuracy = 0.8500,sensitivity = 0.8265,f = 0.8572,AUC = 0.8523,g-means = 0.8519);the hierarchical accuracy of combining the best base classifiers in the above two levels is0.8316 7909,higher than that of direct classification.To sum up,ADASYN algorithm and hierarchical classification have the best performance by using balance database modeling.(2)New thyroid database: with the above process,the performance of Borderline-SMOTE algorithm and hierarchical classification method is the best.(3)Hayes Roth database: Borderline-SMOTE algorithm and hierarchical classification have the best performance.(4)Contract database: ADASYN algorithm has the best performance,in which the hierarchical accuracy of hierarchical classification is 0.8183,and the highest accuracy of direct classification is 0.8180,the former is slightly better than the latter.(5)Wine database: the performance of borderline smote algorithm is the best,the hierarchical accuracy of hierarchical classification is 0.8186,the highest accuracy of direct classification is 0.8172,the former is slightly better than the latter.2.DLBCL example application: three feature selection methods based on single feature relevance ranking method,recursive feature elimination method and random forest were used to screen out 10,11,and 19 feature variables respectively,and 3 feature subsets were constructed.The Borderline-SMOTE algorithm,which is the optimal category balance method selected in the above simulation study,balances categories separately,and uses hierarchical classification to construct a multi-class prediction model of DLBCL disease progression,in which a subset of features selected based on single feature correlation ranking method is used The hierarchical accuracy rate of the constructed hierarchical classification model=0.8864;the hierarchical accuracy rate of the hierarchical classification model constructed by the feature subset selected by the recursive feature elimination method=0.8479;the hierarchical accuracy rate of the hierarchical classification model constructed using the feature subset selected by the random forest Hierarchical accuracy = 0.9263.Conclusion:1.According to simulation research,the category balance performance of Borderline-SMOTE algorithm and ADASYN algorithm is better than SMOTE algorithm,and the performance of the two is not much different.In this study,Borderline-SMOTE algorithm is slightly better than ADASYN algorithm;overall classification performance of hierarchical classification method Better than direct classification.Finally,the Borderline-SMOTE class balance method and hierarchical classification method were selected to construct a multi-class prediction model of DLBCL disease progression.2.Use three methods based on single feature correlation ranking method,recursive feature elimination method and random forest to perform feature selection on the case information database of DLBCL patients.Among them,age,KPS score,disease grade,whether GCB,HBVDNA are three methods to screen Common features.In this study,the DLBCL disease progression stage multi-class prediction model constructed by using the feature subset selected by random forest has the best performance.
Keywords/Search Tags:Diffuse large B-cell lymphoma, Multi-classification, Hierarchical classification, Imbalanced data, Feature selection
PDF Full Text Request
Related items