Font Size: a A A

Prediction Of Breast Cancer Survival Based On Combined Model Of Machine Learning

Posted on:2021-04-02Degree:MasterType:Thesis
Country:ChinaCandidate:X RanFull Text:PDF
GTID:2404330602483966Subject:Applied statistics
Abstract/Summary:PDF Full Text Request
Cancer has long been one of the major threats to human health and life security,with increasing morbidity and mortality,as well as becoming the leading cause of death.Therefore,many medical research institutions are devoted to the research of cancer,especially in the field of survival prediction of cancer.Breast cancer is a common invasive tumor in women,and its incidence rate is also increasing,so it is particularly important to establish a model for predicting the prognosis of breast cancer.Currently,computational models for predicting breast cancer survival have been proposed,but many studies are based on traditional regression methods or on a single machine learning model.This paper focuses on the application of machine learning algorithm in the pre-diction of breast cancer survival.In order to effectively combine the advantages of different single machine learning models in terms of stability and accuracy,the research will be carried out from the perspective of machine learning combination model.This study was based on data from the surveillance,epidemiology,and prognostic program database(SEER)of the national cancer institute for clinical management of breast cancer patients from 2010 to 2015.The paper first prepro-cessed the data,deleted the missing values in the data,and determined the 5-year survival status of the patients according to the follow-up time and survival status,that is,survival and death.Then the undersampling method is adopted to make the two types of samples basically balanced for the problem of data imbalance.In this paper,support vector machine and Logistic regression algorithm are selected to construct two single models,and the empirical results are compared.Firstly,the theoretical knowledge of the two algorithms was studied,and then the prediction model of five-year survival was obtained by learning from the test set respectively,and the performance of the model in indicators such as accuracy and recall rate was investigated on the test set.The results showed that the support vector machine was better than the Logistic regression model.There are two ways to construct the combined model:serial structure and parallel structure.In this paper,the support vector machine and the logistic regression model are serially combined.Specifically,the results of the support vector machine prediction are used as the input variables of the logistic regres-sion.The other input variables remain unchanged,and the test set is used to calculate each evaluation index of the model..The results show that the model with this input variable has improved prediction performance compared with the original single Logistic model.The parallel combination model is to use the out-put results of two single models,assign them different weights,combine to form a new prediction result,and select the best combination among all the given weight combinations.The results show that the combined model has better prediction performance than the two single models.Among the two combination models,the classification indexes of the serial combination axe better than the parallel combination under all given weights.This paper found through research that when using machine learning to build a breast cancer survival prediction model,the prediction effect of the two combined models is better than that of the two single models,and the prediction effect of the serial combined model is better than the parallel combination.For the prediction of future cancer survival,in the study of combined models,different methods can also be used to build combined models based on different single models,which provides new ideas for future research.
Keywords/Search Tags:Breast cancer, Survival prediction, Composite model, Machine learning
PDF Full Text Request
Related items