Font Size: a A A

Regression Imputation Of Mixed Data Based On Penalized Likelihood

Posted on:2022-01-12Degree:MasterType:Thesis
Country:ChinaCandidate:S D WangFull Text:PDF
GTID:2480306482995929Subject:Statistics
Abstract/Summary:PDF Full Text Request
In statistical analysis,missing data is an inevitable problem,and how to perform statistical analysis on missing data is a very important and challenging problem.Due to the ease of use of imputation,the method combined with imputation is a popular way to solve this problem.For this reason,scholars had proposed many imputation algorithms.However,with the advent of the big data era,problems such as the increase of data dimensions and the mixing of different types of variables have arisen,traditional imputation algorithms may suffer from difficulties such as long-running computation and inability to work.Therefore,how to impute missing values for high-dimensional data,especially high-dimensional mixed data,is a problem to be solved.In this paper,we considered the imputation for high-dimensional mixed data and proposed the MIGRL algorithm.The MIGRL algorithm is an extension of the mice algorithm,which can impute missing values of high-dimensional mixed data while retaining the structure within dummy variables.The MIGRL algorithm has the following two characteristics.First,only group lasso is used,which allows variable selection based on the recognition of the structure within dummy variables and is more interpretable.Second,by using Bayesian regression for the final imputation process,the parameters uncertainty can be captured.Through numerical simulations,we showed the imputation performance of MIGRL algorithm with the limited data,and compared it with the Miss Forest,MI-RF,MI-CART and MI-pmm algorithms.The simulation results show that under the missing data mechanism of MAR,the MIGRL algorithm has the best imputation performance;while under the MCAR mechanism,the MIGRL algorithm performs better when imputing missing values for high-dimensional data of ? 9).Finally,we impute and analyze the SIPP data and RNA-Seq data.
Keywords/Search Tags:Mixed data, Missing data, Multiple imputation, Penalized likelihood, Group lasso
PDF Full Text Request
Related items