| With the rapid development of big data,precision medicine,and personalized marketing,heterogeneity and sparsity of data are two important factors that need to be considered when establishing a model to analyze data.Explore the variable selection,subgroup analysis and parameter estimation methods for analyzing this type of data are very important.Firstly,for longitudinal and missing heterogeneous data,this dissertation studies the heterogeneity regression model with missing and longitudinal data.Under the frequentist framework,through the combination of multi-directional separation penalty function and inverse probability weighting,ADMM algorithm is used to solve the problems of parameter estimation,variable selection and subgroup identification of the model.In particular,the method proposed in this dissertation realizes individualized variable selection and The theoretical properties of the relevant estimators are given;under the Bayesian framework,this dissertation first proposes a Bayesian latent subgroup identification model,which can identify subgroups with heterogeneous effects,and then sets the Spike-and-Slab prior,and deduce the posterior distribution of model parameters,using Gibbs sampling to achieve model parameter estimation,personalized variable selection and subgroup identification problems.The two methods are applied to ACTG data and ADNI data respectively,and the analysis shows that the two methods can achieve personalized estimation and obtain better estimation results.Finally,this dissertation studies the variable screening problem of ultra-high-dimensional heterogeneous categorical data with missing data,by combining the borrowing of missing index information with the category-adaptive model-free variable screening procedure for ultra-high-dimensional heterogeneous categorical data,The variable screening problem with heterogeneous classification data when the response variable is missing at random is solved,and the method is applied to the Tumor microbiological data of TCGA,and a better prediction effect of cancer classification is obtained.In summary,through research,this dissertation established a set of relatively complete theory and method system for variable selection,subgroup analysis,and parameter estimation for analyzing longitudinal and missing heterogeneity data,and then applied the analysis methods to real data application. |