| With the end of Human genome project(HGP),post-genome era was coming.Integrating data from various platforms has become increasingly popular.Because of the complexity of data sources,many new challenges arose,which inevitably included how to treat "block missing data".Ensuring the imputation accuracy and precision as well as maintaining the variance-covariance structure of the original data is of great importance to missing data imputation.In this project,we aimed to evaluate the effect of several imputation methods based on both statistical techniques and machine learning techniques.Advanced imputation method would be utilized in WNT pathway to compare the effects of Cox regression models in original and imputed databases.We got lung cancer data-set(DNA methylation and genes expression)from The Cancer Genome Atlas(TCGA),and constructed missing data-set with different missing proportions at 5%,20%,35%,50%and 65%.The statistical methods(Mean imputation method,MCMC)and machine learning methods(kNN,MLP,RF)were applied.Evaluation indicators included estimation bias and matrix 2-norms.At last,imputation time was considered to find out a time-saving and efficient method.After evaluating imputation methods,MLP was adopted to impute missing data in WNT pathway.We reduced dimensions of data-set by(I)SIS,and then built a Cox regression to conduct a prognostic analysis of 5th year.After bootstrapping 1000 times to ensure stability of AUC,imputed data-set showed higher performance than original data-set.MLP and kNN showed high quality in imputation effect and consumed less time whatever different missing ratios set.Mean method had shortest imputation time,and the imputation quality was high under low missing ratio(≤5%).However,the imputation effect of all methods decreased with the increasing of missing ratio.RF and MCMC showed better performance in imputation effect than Mean approach,but took more time at high missing ratios.After comprehensive comparative analysis,machine learning methods such as MLP and kNN turned out suitable in joint imputation process(DNA methylation,gene expression). |