Font Size: a A A

Development Of A Predictive Model Of Distant Metastasis Of Follicular Thyroid Carcinoma Based On Random Forest

Posted on:2019-12-10Degree:MasterType:Thesis
Country:ChinaCandidate:Y N JiFull Text:PDF
GTID:2404330566970770Subject:Information Science
Abstract/Summary:PDF Full Text Request
Objective: With the accuracy of imaging diagnostic techniques,the incidence of thyroid cancer increased year by year,the incidence rate of 15.0 / 100000 people.The prognosis of thyroid cancer is good and 5 to 10 years survival rate is about 80% to 95%.However,the survival rate of patients with distant metastasis fell to about 40%.Currently,there are few studies on the prognosis of thyroid cancer,and the researchers are not concerned about the prognosis of distant metastasis.Because of the high incidence of thyroid cancer and distant metastasis in patients with poor prognosis,the study of distant metastasis prone to follicular thyroid carcinoma patients as the research object.This study obtained prognosis data from the SEER database,and the application of random forest algorithm to build FTC distant metastasis prediction model to assist clinicians diagnosis and improve prognosis of patients with distant metastasis survival.Methods: The SEER * Stat software provided by the SEER database was used to extract the prognosis data of patients with follicular thyroid carcinoma from 2004 to 2014.According to the NCCN guidelines,AJCC guidelines,SEER guidelines,collaborative stage data collection system and existing literatures,the prognostic variables were initially selected,including gender,age,race,marital status,living area,tumor size,years,histological type ICD.O.3,surgery operation,lymph node operation,extension,regional lymph node metastasis and multifocal tumor.Outcome variable is the occurrence of distant metastasis.By deleting the missing values,data conversion,data discretization,the data set is processed initially and then divided into training set and testing set according to the ratio of about 7: 3.Use SMOTE technology to adjust training set to balanced data set.Based on the new training set,we used SPSS20.0 to conduct the statistical analysis method of univariated analysis and logistic regression,and used R studio to conduct the machine learning method of random forest's variable importance to select the characteristic variables.Then the prediction model was constructed based on the new training set and random forest algorithm.The testing set was used to evaluate the performance of the prediction model and compared with the decision tree and artificial neural network algorithm.The evaluation index was specific,sensitivity,G-mean index,accuracy and area under the ROC curve.Results: There were 5278 samples in the primary data dataset,of which 203(1/25)in the patients with distant metastasis were unbalanced datasets.The new training set adjusted by SMOTE technology contains 5,616 samples,and the number of positive and negative samples basically reaches the balance.Highly correlated predictors were age,extension,tumor size,regional lymph node metastasis,and histological type ICD.O.3.The G-mean index and area under the ROC curve of the model based on random forest are 0.767 and 0.837,respectively,which are better than the 0.367 and 0.565 of the decision tree in general and also better than 0.629 and 0.75 of the artificial neural network.Conclusion: In this study,patients with follicular thyroid carcinoma were selected as the research object.Based on the random forest algorithm,a distant metastasis prediction model was constructed.The G-mean index and area under the ROC curve were 0.767 and 0.837 respectively.The adjustment of unbalanced training set by SMOTE technology improves the accuracy of positive sample classification.The SEER database used in this study is mainly used for follow-up data.In future studies,laboratory test data can be added to improve the accuracy of the model to assist in clinical decision-making.
Keywords/Search Tags:SEER, follicular thyroid carcinoma, distant metastasis, random forest, SMOTE, prediction model
PDF Full Text Request
Related items