Font Size: a A A

Applications Of Adaptive Elastic Net Procedure For Cox Model

Posted on:2018-07-22Degree:MasterType:Thesis
Country:ChinaCandidate:H L ZhaoFull Text:PDF
GTID:2334330536963441Subject:Epidemiology and Health Statistics
Abstract/Summary:PDF Full Text Request
Objective: In survival analysis,Cox model is a classic model of analysis of survival data.With the rapid development of high throughput technology,we could make the determination of tens of thousands of genes.While the sample always very small,it is particularly important that how to screen the gene highly associated with disease from a wide range of genes.Traditional Cox model is not suitable to deal with the gene expression data with high dimension and strong association.Although classic Lasso method make the variable selection and coefficient estimated of high dimensional data,the method make same penalty on all variables and the estimation is biased,and the effect of selection of the relevant variables is very poor.To make a more accurate and more sparse model,this paper introduces Adaptive Elastic Net to Cox model,and compare with three other variable selections,Lasso、Adaptive Lasso、Elastic Net,as to get a more realistic model,and lay the foundation for the high-dimensional data analysis methodology research in the future.Method:1 Data simulation and real data analysis are conducted on the R software 3.3.0.In the research,we use the "Matrix" 、 "MASS" 、 "survival" 、 "Coxnet" four R packages to make data simulation and real data analysis.The analysis package "Coxnet" uses one-step coordinate descent algorithm,in real gene expression data,the number of genes associated with the disease is rare small,as the last model has sparse structure of coefficients.This algorithm is especially suitable for this situation,and runs extremely fast by taking into account the sparse structure of coefficients,has high efficiency of data processing.2 Taking into account the nature of high dimensions、the correlation between variables、small sample and data censoring of gene expression data in the survival analysis,when generate simulated data,the correlation between variables set 0.3、0.6、0.9,data censoring set are 20%、50%、70%,generate nine kinds of simulation data,the sample is 100,the number of variables is 1000.In each solution,the first 10 variables coefficient are defined as 5,as high information variables,the next 10 variables coefficient are defined as 2,as low information variables,other variables coefficient are defined as 0,as zero information variables.Each of these solution repeats 1000 times,and use four variable selection methods for variable selection,compare the selection result of three parts information variables.The method of selecting the best turning parameter λ is five folds cross validation.3 The example data comes from the study of Van’t Veer in Netherlands,DNA microarray analysis on primary breast tumors.This article select 78 patients,who does not make lymph node metastasis,each case has 4751 genes.End event is defined as whether breast cancer patients have the distal metastasis,1 is the shift,0 is no transfer.We use four methods to make variable selection for example data,and estimate the coefficient of each variable in the best model.The method of selecting the best turning parameter λ is also five folds cross validation.Results:1 Through the data simulation of four methods for variable selection,when the percentage of data censoring is 20%,the percentage of the first part of variables in the final model is close to 100%,high information variables are almost completely into the final model.The percentage of the second part of variables in the final model is high,compared four methods,The Adaptive Lasso method to Lasso method,the Adaptive Enet method to Enet method screening variable percentage has decreased.As to compare the number of variables selected into final model,the number of variables selected after Elastic net penalty generally higher than the only Lasso punishment,especially variables have strong correlation,and the number of variables selected using adaptive Lasso punishment is less than the number only Lasso punishment.2 According to the results of example data analysis,Lasso、Adaptive Lasso、Elastic Net、Adaptive Elastic Net four variable selection methods analysis the example data,the number of variables in the final model is 11、4、21、8,respectively,and the best turning parameters lambda is 0.2072、0.2501、0.3435、0.500,respectively.The number of variables selected into final model by adaptive Lasso significantly lower than the Lasso method,and compare the variable coefficient of two models,the variable coefficient by the adaptive Lasso method absolute less than the Lasso method.The number of variables selected into final model by adaptive Enet significantly lower than the Enet method,and compare the variable coefficient of two models,the variable coefficient by the adaptive Enet method absolute less than the Enet method.Conclusion:1 Elastic net method,the Lasso method can deal with survival data with high dimension,but Enet can select more strong relevant variables into the final model,it has the nature of group effect,Lasso method does not have the property.2 When deal with survival data with high dimension and strong correlation variables,variable selection accuracy of the Adaptive Elastic net is better than Elastic net method,the Lasso method and adaptive Lasso method.
Keywords/Search Tags:high dimensional data, Cox model, Lasso, Adaptive Lasso, Elastic net, Adaptive Elastic net
PDF Full Text Request
Related items