| Variety and heterogeneity are the basic characteristics of big data.Weak data signals and high noise often make the model analysis less effective.In the real data analysis,the inherent structural information of the data itself can be used as a very effective auxiliary information.At the same time,the structural characteristics of data are often the research goal of data analysis.Therefore,it is necessary to consider the structural information in the methodology research on high-dimensional data.Before we delve into our study,the cure rate model,the finite mixture of regression model and the multi-source data analysis methods are reviewed in details.Those models and methods are oriented to data heterogeneity or variety.Under those model frameworks,our penalty function-based study explores how to use and extract structural information.Specifically,we firstly propose a penalization method for estimating the mixture cure rate model where we explicitly consider the structural effects of covariates.Different from the existing literatures which put a strict constraint on covariate effects,we study the struc-tUres of covariate effects by examining the magnitudes of regression coefficients and allow them to be similar but not equal,which is more flexible.Depending on data characteristics,we develop different penalties and corresponding computational algorithms.The proposed method has an intuitive formulation and can be effectively realized.Simulation shows that the proposed method outperforms the alternatives by more accurately estimating paraneters and identifying relevant variables.Two breast cancer datasets,one with low-dimensional clinical variables and the other with high-dimensional genetic variables,are analyzed.With an angle different from those of the existing studies and advantages demonstrated in simu-lation and data analysis,the proposed method is warranted beyond the existing ones.Secondly,in the finite mixture of regression model,a structure penalization approach which can adapt to high dimensional data is developed for regularized estimation and se-lection of important variables,and equally importantly,identification of the underlying co-variate effect structure.In the existing literature,there is a lack of attention to the difference among important covariates,which can lead to the underlying structure of covariate effect-s.Specifically,important covariates can be classified into two types:those that behave the same in different subpopulations and those that behave differently.Properly identifying such a structure enables us to better understand covariates and their associations with outcome.Our research work can effectively offer a solution.The proposed approach can be effec-tively realized,and the statistical properties of the proposed method have been studied.We provide non-asymptotic oracle results and the consistency of our proposed method has been also established.Simulation demonstrates its superiority over alternatives.In the analysis of cancer gene expression data,interesting models/structures missed by the existing analysis are identified.The third is aiming at more consensus and accuracy clustering results across multi-omics data.A penalization approach is proposed for borrowing information from important regulators to assist in clustering.Our method is designed under mixture model structure which 1s oriented to the heterogeneity and variety of omics data.Different from the exist-ing literatures which are on target to model cancer outcomes or identify GE signals more accurately,the goal of our research is to obtain more accurately clustering results.In the simulation and multidimensional omics data analysis,the proposed method shows a superi-or performance. |