| Hepatocellular Carcinoma(HCC) is the most familiar type of liver cancer. The morbidity of HCC is on the top five of all cancers and its mortality rate is on the top four. As big victim of HCC, the percentage of Chinese is 55% of the people who are suffered from HCC around the world and half of which were dead. There are about 800,000 new victims of HCC in which about 430,000 are Chinese.When it comes to HCC, the doctors always need to answer such questions: how long will the patient survive? Or, what factors would affect the survival time of the patient? It would be of great help for the doctors and researchers if we could solve these questions correctly. And based on the right answer, the doctor would make and select a better therapy for the patients, and also, the researchers would benefit from it.In this paper, we would analyze the survival data of 4162 patients from the shanghai zhongshan hospital from 2003 to 2012 one three aspects which are time, variable selection, and model selection. We would use the univariate analysis and logistic regression to explore the potential factors which affect the survival time of the patients after the remove surgery. And then used the proportional hazard model and random survival forest to establish prognostic model of the survival time of the patients after the remove surgery. Firstly, we establish ten-year and three-year prognostic model from which we could learn that the short time(three-year) model give an better accuracy. On the part, we use the univariate analysis to find the potential factors and then put these factions into the Cox proportional hazard model to establish the prognostic model.In order to find out the stability of the Cox proportional hazard model, we chose another variable selection method, firstly, we computed the weight of evidence and information of every variable and used the principle of variable selection proposed by Naeem Siddiqi on 2006 to select the potential factors and put them the binary logistical regression model to select the factors for the Cox proportional hazard model. On the part, we find that different variable selection methods only effected the accuracy of the prognostic model a little bit which meant that the Cox proportional hazard model is quite stable.As mentioned above, the Cox proportional hazard model is quite stable on the research and the particular form of the model depends on our subjective choice and we couldn’t estimate its cumulative hazard function therefore we couldn’t predict the survival rate of any patient with given data. Historical documents have told us that the random survival forest would give a higher accuracy compared with the Cox proportional hazard model, and it would both detect the interaction effect of all the variables and estimate the cumulative hazard function automatically, therefore, on the last part of the empirical analysis, I used the random survival forest to establish the prognostic model to find if there was an better prognostic model and we found that the accuracy of the this prognostic model is also about 72%. And finally, we give the survival curve of the patient whose relevant data are given.In conclusion, we found that the accuracy of these three models are all about 72%. However, the accuracy of the Cox proportional hazard model may probably go down if given different data or use another form because of the subjectivity of the model form. And the random survival forest would detect the interaction effect of all the variables and estimate the cumulative hazard function automatically and also gives a more stable accuracy, therefore, I think the random survival forest is the best prognostic model to estimate the curative effect of aided treatment of the HCC patients after remove surgery. The primary factors that effects the survival rate of the HCC patients after the remove surgery are cancer embolus(yes or no), the size of the tumor, Hepatitis B e antigen(positive or negative), interferon therapy(yes or no), interventional therapy(yes or no) and so on. |