| In recent years,several large-scale cancer genome projects have been launched internationally,such as the Cancer Cell Line Encyclopedia(CCLE),Cancer Genome Project(CGP),and Cancer Genome Atlas(TCGA),which have produced large-scale pharmacogenomics.Data makes it easy for researchers to use computational methods to dig deeper into the important information behind massive data.This paper uses the GDSC dataset to identify statistically and biologically significant gene-drug co-modules from high-dimensional gene expression data and anti-cancer drug response data based on classical partial least square and non-negative matrix factorization algorithms.From the perspective of gene regulation,it helps people understand the molecular mechanism of anticancer drug action and screen potential drug targets.The partial least square algorithm is favored by researchers because of its simplicity and ease of operation.Studies have shown that the sparse partial least square algorithm(SNPLS)with gene network regular constraints can effectively identify gene-drug co-modules.The algorithm only considers the correlation information between genes,and does not consider the correlation between drugs.In this paper,we first transformed chemical structures of drugs into digital sequences,computed Jaccard correlation coefficients between digital sequences,and then constructed a drug association network.Next,we incorporated the information from drug association network into sparse partial least square algorithm with gene network,and presented sparse partial least square algorithm with gene and drug association networks(SGDPLS),which uses it to identify gene-drug co-modules.The result showed that compared with SNPLS,the correlations between the gene modules and drug modules identified from the common module are improved significantly due to the incorporation of drug association network,and the interpretability of the modules is enhanced.The non-negative matrix factorization algorithm is now widely used in data feature extraction.Its advantage is that it can effectively reduce the dimensionality of data while retaining the key information of the data.From the latest gene expression data and drug response data downloaded from GDSC database,complete drug response data are obtained by filling in the missing data.Gene similarity matrix,drug similarity matrix and gene-drug similarity matrix are obtained by calculating the Pearson correlation coefficient.The decomposition factor of gene and drug information was obtained by joint non-negative matrix factorization algorithm(JNMF).Based on the joint non-negative matrix decomposition,the similarity matrix difference is added,and the correlation among multiple variables is combined to add constraints to the common module recognition framework algorithm,and presented sparse joint non-negative matrix factorization algorithm with similarity constraints(SSJNMF),which uses it to identify gene-drug co-modules.And compared with two non-negative matrix factorization algorithms,JNMF and NetNMF.The result show that the gene-drug comodules identified by SSJNMF are non-random,and have higher statistical significance and biointerpretability. |