| Individualized effect design is a meaningful treatment plan for the patient to optimize the personalized therapy.Subgroup analysis has become popular to realize the individualized treatment effect,but the fact how to identify the group pattern under the integration of highdimensional features,such as the collected genetic data by some biomedical tool,brings great concerns for data scientists.In terms of medical background,discussing the heterogeneity of treatment in the context of high-dimensional characteristics is helpful to find the optimal benefit group and provide evidence for subsequent studies,which is crucial for the development of precision medicine.In this paper,we study subgroup analysis in high-dimensional data sets.Firstly,we propose a heterogeneous and high-dimensional regression model with groups and sparse structures.Some variables in the model have a heterogeneous effect on the response,with a potential group structure,while others have a sparse structure,with a large number of zero effects.We use quadratic loss function based on double concave penalty and alternate direction multiplier optimization method to recover group pattern and estimate model parameters.Secondly,we demonstrate two main results when the structure of the model is known.The first is that Oracle least squares estimates are close to the true value with some probability,The other is that the objective function has a local minimum,and the local minimum converges to the Oracle least squares estimator with probability.Finally,under the three conditions of single heterogeneous high dimension,single heterogeneous ultra-high dimension,and multi-heterogeneous high dimension,simulation research is carried out on group structure restoration and high dimensional important variable identification,and empirical analysis is carried out on community and crime data sets.For simulation research,the effectiveness of this method is verified by comparing the results with the results of homogeneous regression and the results of related indicators.For the empirical part,the probability density curve of residual error under grouping and the boxplot of high-dimensional variable coefficient estimation under two kinds of penalties demonstrate the effectiveness of group identification and high-dimensional variable sparsity.The innovation of this paper is to consider the heterogeneity and high-dimensional characteristics of a single observation respectively.On the one hand,the quadratic loss function method with double penalty is constructed to carry out subgroup analysis and variable selection at the same time,which greatly improves the estimation accuracy.On the other hand,compared with the limitation of model selection and parameter estimation in traditional linear regression models,the proposed double penalty least squares estimation method can estimate model parameters and restore structure in one step.The disadvantage is the limitation of the target body in unitary response and the limitation in practical application when heterogeneous and high-dimensional variables are considered separately in order to reduce the complexity of the model. |