Research On Models For High Dimensional Data With Structural Information

Posted on:2020-05-17

Degree:Doctor

Type:Dissertation

Country:China

Candidate:M Q Liu

Full Text:PDF

GTID:1480305741964889

Subject:Statistics

Abstract/Summary:

PDF Full Text Request

Variety and heterogeneity are the basic characteristics of big data.Weak data signals and high noise often make the model analysis less effective.In the real data analysis,the inherent structural information of the data itself can be used as a very effective auxiliary information.At the same time,the structural characteristics of data are often the research goal of data analysis.Therefore,it is necessary to consider the structural information in the methodology research on high-dimensional data.Before we delve into our study,the cure rate model,the finite mixture of regression model and the multi-source data analysis methods are reviewed in details.Those models and methods are oriented to data heterogeneity or variety.Under those model frameworks,our penalty function-based study explores how to use and extract structural information.Specifically,we firstly propose a penalization method for estimating the mixture cure rate model where we explicitly consider the structural effects of covariates.Different from the existing literatures which put a strict constraint on covariate effects,we study the struc-tUres of covariate effects by examining the magnitudes of regression coefficients and allow them to be similar but not equal,which is more flexible.Depending on data characteristics,we develop different penalties and corresponding computational algorithms.The proposed method has an intuitive formulation and can be effectively realized.Simulation shows that the proposed method outperforms the alternatives by more accurately estimating paraneters and identifying relevant variables.Two breast cancer datasets,one with low-dimensional clinical variables and the other with high-dimensional genetic variables,are analyzed.With an angle different from those of the existing studies and advantages demonstrated in simu-lation and data analysis,the proposed method is warranted beyond the existing ones.Secondly,in the finite mixture of regression model,a structure penalization approach which can adapt to high dimensional data is developed for regularized estimation and se-lection of important variables,and equally importantly,identification of the underlying co-variate effect structure.In the existing literature,there is a lack of attention to the difference among important covariates,which can lead to the underlying structure of covariate effect-s.Specifically,important covariates can be classified into two types:those that behave the same in different subpopulations and those that behave differently.Properly identifying such a structure enables us to better understand covariates and their associations with outcome.Our research work can effectively offer a solution.The proposed approach can be effec-tively realized,and the statistical properties of the proposed method have been studied.We provide non-asymptotic oracle results and the consistency of our proposed method has been also established.Simulation demonstrates its superiority over alternatives.In the analysis of cancer gene expression data,interesting models/structures missed by the existing analysis are identified.The third is aiming at more consensus and accuracy clustering results across multi-omics data.A penalization approach is proposed for borrowing information from important regulators to assist in clustering.Our method is designed under mixture model structure which 1s oriented to the heterogeneity and variety of omics data.Different from the exist-ing literatures which are on target to model cancer outcomes or identify GE signals more accurately,the goal of our research is to obtain more accurately clustering results.In the simulation and multidimensional omics data analysis,the proposed method shows a superi-or performance.

Keywords/Search Tags:

Structural information, Penalization, Cure rate model, Finite mixture of regression model, Multi-omics data

PDF Full Text Request

Related items

1	Variable Selection In Marginal Regression Mixture Cure Model For Clustered Failure Time Data With A Cure Fraction
2	Statistical Inference Of The Mixture Cure Model For Survival Data
3	Analysis Of The Influencing Factors Of The Stock Price Index Under The Mixture Cure Model
4	The Proportional Structure Of The Covariate In The Mixed Cure Rate Model With Penalization
5	Feature Selection For Semi-parametric Mixture Cure Model
6	Joint Model Of Longitudinal Measurements And Survival Times With A Cure Fraction
7	Comparative And Analysis Of Mixture Cure Model In R
8	Efficient Estimation Of The Partially Linear Non-mixture Cure Model With Auxiliary Subgroup Survival Information
9	Conditional Sure Independent Screening Based On Finite Mixture Model
10	Statistical Analysis Of The Incubation Period Of COVID-19 Based On The Generalized Odds Rate Mixture Cure Model