| With the rapid development of high-throughput sequencing technology,a large amount of accumulated high-quality cancer data has promoted cancer research at the genome level,but also brought great challenges.As a particularly complex disease,the occurrence and development of cancer is not determined by a single gene,but involves a series of genes and their complex interactions.Therefore,an important issue in cancer research is the study of interactions at the molecular level and the construction of interaction networks,which are closely related to the occurrence,development,and prognosis of cancer.Although the existing research has made important progress in biological network construction and interaction analysis,there are still many limitations.On the one hand,reconstructing biological networks from high-dimensional data is a challenging problem due to unrecognized sample heterogeneity and the underlying common structure of different subtypes of the same cancer.The existing studies are still not practically effective due to their expensive computational cost or rigorous assumptions.On the other hand,due to the high-dimensionality of genetic measures,existing interaction analysis methods usually lack sufficient information and remain unsatisfactory.The massive accumulation of biological network information allows researchers to identify biomarkers from a systems perspective by exploiting network selection(composed of functionally relevant biomarkers)as well as network structure.In genetic main effects analyses,network information has been widely incorporated.However,there are still large gaps in the analysis of interactions.In addition,there are fewer studies on survival time,given its challenging characteristics such as censoring.Two-level analysis of genes and their associated pathways has received extensive attention in recent biomedical research and has been shown to be more efficient than single-level analysis,but such analysis is usually limited to main effects.Pathways are not isolated,and their interactions have also been suggested to make important contributions to the prognosis of complex diseases.Therefore,in response to the above problems,this dissertation focuses on developing methods for interaction network construction on heterogeneous cancer data(Chapter 2),and genetic interaction analysis for continuous response(Chapter 3)and survival data(Chapter 4).In Chapter 2,we develop a new joint estimation approach for multiple networks by solving a collection of sparse regression problems in the presence of undiscovered sample heterogeneity.Under the framework of the Gaussian graphical model,we propose a regularized sparse group lasso model based on the square root loss function,effectively accommodating the specific and common information across networks of multiple subgroups.Significantly advancing from the existing likelihood regularization-based heterogeneity network analysis,the proposed approach enjoys the computational simplicity,scalability,and potential asymptotic tuning-free property.An efficient Expectation Maximization+Proximal Newton algorithm is developed,which is much desirable for high dimensional analysis.The estimation and selection consistency properties of the proposed estimators are rigorously established.Extensive numerical experiments with simulated data and breast cancer data demonstrate the prominent performance of the proposed approach in both subgroup and network identifications.In Chapter 3,we develop a novel structured Bayesian interaction analysis approach,effectively incorporating the network information.This study is among the first to identify gene-gene interactions with the assistance of network selection for phenotype prediction,while simultaneously accommodating the underlying network structures of both main effects and interactions.It innovatively respects the multiple hierarchies among main effects,interactions,and networks.Bayesian method is adopted,which offers a more informative approach on the estimation and prediction over some other techniques such as penalization.An efficient variational B ayesian expectation-maximization algorithm is developed to explore the posterior distribution.Extensive simulation studies demonstrate the practical superiority of the proposed approach.The analysis of real data on melanoma and lung cancer leads to biologically sensible findings with satisfactory prediction accuracy and selection stability.In Chapter 4,we develop a novel two-level Bayesian interaction analysis approach for survival data.This approach is the first to conduct the analysis of lower-level gene-gene interactions and higher-level pathway-pathway interactions simultaneously.Significantly advancing from existing Bayesian studies based on the Markov Chain Monte Carlo technique,we propose a variational inference framework based on the accelerated failure time model with favourable priors to account for two-level selection as well as censoring.The computational efficiency is much desirable for high dimensional interaction analysis.We examine performance of the proposed approach using extensive simulation.Application to melanoma and lung adenocarcinoma data leads to biologically sensible findings with satisfactory prediction accuracy and selection stability. |