| Gene expression is the most basic level at which genotypes produces phenotypes and is critical to the development of organisms.With the successful completion of the human genome project,gene expression data has proliferated,and the analysis and processing of such data become a major bottleneck in exploring its application.In precision medicine,accurate prediction and identification of drug targets and biomarkers of malignant tumors is of great significance for clinical treatment and cancer cure.Assessing whether a new individual has cancer is more efficient and less costly than testing for genetic variations,so it makes sense to develop a method that uses theoretical knowledge and computational techniques to predict cancer and identify associated genes.Based on regularization technology in statistical learning and combined with genome information of prior knowledge of the two pathways and interaction network information,this paper created two types of cancer classification and associated gene recognition models.The specific content of this paper is as follows:1.Cancer prediction and gene identification based on stacked sparse group lasso.Biochemical pathways are molecular mechanisms by which intracellular and extracellular reaction networks control cellular components,which uniformly control the expression of certain gene proteins or compounds to regulate different phenotypic expression.Based on GSEA gene pathway information,we construct the Stacked SGL model by using the stacking integration strategy and sparse group Lasso,which demonstrates stable and good predictive performance in cancer classification.Meanwhile,Stacked-hoc model is built to enhance the feature inference of Stacked SGL.The results of simulation experiments and cancer case study show that Stacked SGL could select relevant features more effectively,identify more potential mutants,enhance the interpretability of the model and improve the prediction performance.2.Cancer prediction and gene identification based on weighted elastic networks.Many different biological processes can be represented by graph networks,such as regulatory networks,metabolic pathways and protein interaction networks(PPI).The overlap of pathways is a common phenomenon in biological network analysis,which may lead to high correlation of activity between biomarkers of pathways.To this end,we construct multiple independent PPI networks.The importance weight of each gene is scored according to the frequency of gene nodes appearing in all networks and the degree of connection in the network that the gene participates in.The simulation results show that WEN has lower sparsity and better prediction performance compared with other models.In real data study,WEN infers the subnetworks of MAPK and PI3K-Akt pathways which are closely related to the pathogenesis of thyroid cancer. |