Extracting grouping structure or identifying homogeneous subgroups in regression has received increasing attention in recent years.For heterogeneous data,the assumption of homogeneity under classical statistical models leads to biased estimates.Therefore,it is critical to identify homogeneous subgroups from heterogeneous population.In high-dimensional data analysis,there is group structure between covariables.For example,a categorical variable can be represented by a group of dummy variables.Variable selection methods under sparsity hypothesis,such as LASSO,tend to arbitrarily select only one from each group,making the model difficult to interpret.Homogeneity is a more general assumption than sparsity,enabling us to select more variables and give information about the relationship between covariables,so as to enhance the model interpretability and improve the predictive performance.At present,a large number of literatures have developed subgroup analysis methods for identifying homogeneous group structure from a heterogeneous population.However,the penalty terms of existing subgroup analysis methods based on pairwise fusion penalties contain a large number of redundant pairwise differences of individual effects,leading to statistical and computational inefficiency.To solve this problem,we propose a method for subgroup analysis based on the median regression model to estimate and identify homogeneous subgroups for network-linked data.We use both covariates and network to identify subgroup structures from a heterogeneous population,where heterogeneity arises from unknown or unobserved latent factors.We automatically divide the sample into different subgroups by penalizing pairwise difference of intercepts for individuals connected by an edge in the network.The proposed method can also be used to predict response variables for new subjects with only covariates by taking advantage of the network reconstructed after adding these new subjects.We solve the nonconvex optimization problem based on the local linear approximation and establish the oracle properties of the proposed estimator under some regularity conditions.Our simulation studies show that the proposed method can effectively identify homogeneous subgroups.Finally,the advantages of the proposed method are further illustrated by the analysis on a real estate transaction data.Besides,methods for identifying homogeneous subgroups of regression coefficients in high-dimensional data analysis have been well studied in many literatures.However,little attention has been received to the study of sparse features.This leads to design matrices in which many columns are highly sparse,traditional statistical methods are no longer suitable.To deal with the challenges posed by sparse features,we propose a feature aggregation method based on composite quantile regression.A nonconvex pairwise fusion penalty is used to automatically detect and identify homogeneous subgroups of predictors,and predictors in the same subgroup are combined into a relatively dense latent factor.To implement the method,we propose an efficient algorithm based on the alternating direction method of multipliers framework,and establish the oracle property of the proposed estimators under some regularity conditions.Both simulation results and real data analysis show the effectiveness of our proposed method. |