Font Size: a A A

Prognosis Analysis Of Breast Cancer Based On Gene Data

Posted on:2021-01-03Degree:MasterType:Thesis
Country:ChinaCandidate:S L YuFull Text:PDF
GTID:2404330626958915Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Breast cancer has become the highest incidence rate of malignancy in women,the incidence rate of breast cancer is increasing year by year.If we can predict the prognosis of breast cancer accurately,it is of great significance to clinical practice.As an important part of prognosis prediction,to improve the ability of survival prediction,on the one hand,it can protect the physical and mental health of patients,on the other hand,it can help clinical workers to make treatment decisions.In recent years,with the development of biological information technology and the improvement of cancer data,the previous subjective experience method and the traditional statistical analysis method can not fully extract the information of these data.On the one hand,cancer data is incomplete,including the loss of the records of some sample and the loss of test data from measurement instrument.on the other hand,cancer data is heterogeneous.Simply adding multiple data sets will cause data redundancy,and it is difficult to make a reasonable explanation for the biological point.So,in order to integrate the data of breast cancer effectively and predict the survival of breast cancer patients more accurate,we propose a data fusion method based on xgboost model.In this paper,we describe the model applied to breast cancer data on the following aspects:(1)In dealing with missing values,xgboost model can train the samples without filling the missing values.It can not only retain the original information of the samples,but also modify the division of missing values in the process of continuous iteration,making the final judgment of missing values closer to the real value.Therefore,when we deal with the missing value,we do not delete the samples,but keep the samples,which greatly increases the size of the training set;(2)In the aspect of prediction performance,this paper first compares the results of xgboost model in single mode data set and multi-mode data set,and the results show that the results of multi-mode data set are more effective than the results of single mode data set.Then,this paper will compare the fusion algorithm based on xgboost model and the fusion algorithm based on DNN model.The results show that the fusion algorithm based on xgboost model is not only better than other models in multi-mode data set,but also has good robustness in single-mode data set;(3)In the aspect of biological meaning,the traditional machine learning algorithm aims to train the model with good performance in survival prediction,but ignores the hidden biological meaning in the data.It is of great significance for medical workers to mine gene markers related to survival from gene data set for drug research and treatment decision-making.In this paper,141 genes were identified by the xgboost model,and were analyzed by Go Enrichment analysis,KEGG pathway analysis and coding protein interaction network analysis.The results show that these genes were closely related to cell division,apoptosis,cell proliferation,cancer pathway and other biological processes.In conclusion,in order to prove the applicability of the fusion algorithm based on xgboost model in breast cancer survival prediction,we started from a large number of clinical data and gene data of breast cancer patients,and used traditional statistical analysis methods to screen out breast cancer related genes and clinical characteristics,and then a fusion algorithm based on xgboost model is established to predict breast cancer survival.In order to illustrate the effectiveness of the model,this paper compared the model in single-mode data set and multi-mode data set,and also compared with the DNN model.Finally,141 gene markers were screened out,and the biological function analysis was carried out,which will play a important role in further study of breast cancer related drugs and clinical treatment.
Keywords/Search Tags:breast cancer, gene, xgboost model, survival prediction, functional analysis
PDF Full Text Request
Related items