Breast cancer is the most prevalent malignant tumor in women worldwide and is a serious threat to women’s physical and mental health.With the advancement of breast cancer screening and treatment technologies,the postoperative survival rate of breast cancer patients has steadily increased,and the study of prognostic models for individual disease course prediction after breast cancer surgery has become more and more important.However,since the 1950 s,we have not considered the modeling of the survival process as a stochastic process from numerical fitting models to machine learning methods.In recent years,we have begun to introduce neural networks into survival analysis,but still rely on log-linear risk assumptions in numerical methods or use pre-determined risk distributions.The turnover of regressors has failed to steadily improve the consistency coefficient of discrete-time survival analysis ground,and we believe that some constructive improvements are needed for modeling discrete survival analysis.The current modeling of discrete survival analysis has the following two problems:(1)most discrete survival analysis models model the survival process as a linear transformation of the regressors to fit the survival states;(2)the treatment of a single regressor is still not sufficient to fully describe the risk distribution and accumulation of discrete survival analysis,and we believe that discrete survival analysis still needs to consider the survival process as a whole that can be optimized.We need to optimize its loss function and model structure.The main work and contributions of this thesis are as follows.1.We collected and reviewed numerical models,machine learning models and recently developed survival analysis models developed since the beginning of breast cancer prognostic models.We also collected some of the current international common breast cancer survival analysis datasets and described and presented their distributions to facilitate the discrete-time survival analysis studies carried out subsequently.2.To address the problem that the modeling of survival process still retains the linear transformation of regressors for fitting survival status,this thesis innovatively proposes a multi-task banded regression model for fitting survival analysis of slow and long disease duration represented by cancer.Firstly,a multi-task regression model combining regressor and banded transformation is established by introducing a banded calibration approach,and the validity and benignity of the banded test transformation matrix are mathematically derived and verified.Secondly,two special matrices of the multi-task banding matrix are given.Finally,the validity of the model proposed in this thesis for individual survival analysis risk prediction is verified on real cancer acute illness and survival analysis datasets.In the 95% confidence interval,the consistency coefficient of the model in this thesis is improved by about 0.05 compared with that obtained by the current mainstream general neural network survival analysis model in the METABRIC dataset.3.To address the problem that the single regressor of the current multi-task model for survival analysis still fails to integrate as a martingale,we demonstrate that the regression results of the single regressor are saddle processes that can be superimposed,and their superimposed properties are discussed,giving a new loss function and regression form for multi-task regression based on cumulants,deriving and verifying the theoretical basis and validity of this regression approach.Second,we extend the band calibration mentioned in the previous chapter into this chapter,perform a new decomposition and calibration form,and verify its benignity.In this chapter,we adopt a more rigorous experimental validation method.In the 95% confidence interval,the consistency coefficient improvement of the model in this thesis compared with the current mainstream general neural network survival analysis model obtained in the METABRIC dataset is improved by about 0.05 after correcting the loss function,and the robustness of the algorithm is improved under certain circumstances after adding the new banded calibration function in the SUPPORT dataset can improve the consistency coefficient by about 0.04.The experimental results reflect that both approaches are effective ways to improve the consistency coefficient of discrete survival analysis,and have better fitting results and higher consistency coefficients compared with other discrete-time survival analyses In terms of integral Brier score,it had the highest evaluation results among all discrete time-event survival analysis models.. |