Two Types Of Gene-drug Co-module Identify Algorithms

Posted on:2020-09-13

Degree:Master

Type:Thesis

Country:China

Candidate:Y J Mao

Full Text:PDF

GTID:2404330599959947

Subject:Computational Mathematics

Abstract/Summary:

PDF Full Text Request

In recent years,several large-scale cancer genome projects have been launched internationally,such as the Cancer Cell Line Encyclopedia(CCLE),Cancer Genome Project(CGP),and Cancer Genome Atlas(TCGA),which have produced large-scale pharmacogenomics.Data makes it easy for researchers to use computational methods to dig deeper into the important information behind massive data.This paper uses the GDSC dataset to identify statistically and biologically significant gene-drug co-modules from high-dimensional gene expression data and anti-cancer drug response data based on classical partial least square and non-negative matrix factorization algorithms.From the perspective of gene regulation,it helps people understand the molecular mechanism of anticancer drug action and screen potential drug targets.The partial least square algorithm is favored by researchers because of its simplicity and ease of operation.Studies have shown that the sparse partial least square algorithm(SNPLS)with gene network regular constraints can effectively identify gene-drug co-modules.The algorithm only considers the correlation information between genes,and does not consider the correlation between drugs.In this paper,we first transformed chemical structures of drugs into digital sequences,computed Jaccard correlation coefficients between digital sequences,and then constructed a drug association network.Next,we incorporated the information from drug association network into sparse partial least square algorithm with gene network,and presented sparse partial least square algorithm with gene and drug association networks(SGDPLS),which uses it to identify gene-drug co-modules.The result showed that compared with SNPLS,the correlations between the gene modules and drug modules identified from the common module are improved significantly due to the incorporation of drug association network,and the interpretability of the modules is enhanced.The non-negative matrix factorization algorithm is now widely used in data feature extraction.Its advantage is that it can effectively reduce the dimensionality of data while retaining the key information of the data.From the latest gene expression data and drug response data downloaded from GDSC database,complete drug response data are obtained by filling in the missing data.Gene similarity matrix,drug similarity matrix and gene-drug similarity matrix are obtained by calculating the Pearson correlation coefficient.The decomposition factor of gene and drug information was obtained by joint non-negative matrix factorization algorithm(JNMF).Based on the joint non-negative matrix decomposition,the similarity matrix difference is added,and the correlation among multiple variables is combined to add constraints to the common module recognition framework algorithm,and presented sparse joint non-negative matrix factorization algorithm with similarity constraints(SSJNMF),which uses it to identify gene-drug co-modules.And compared with two non-negative matrix factorization algorithms,JNMF and NetNMF.The result show that the gene-drug comodules identified by SSJNMF are non-random,and have higher statistical significance and biointerpretability.

Keywords/Search Tags:

Partial least square algorithm, non-negative matrix factorization algorithm, drug association network, similarity matrix, gene module, drug module, gene-drug co-module

PDF Full Text Request

Related items

1	Modular Feature Matching Method Of High Dimensional Omics Data Based On Network Analysis
2	Algorithm For Layer-specific Module Detection In Multi-layer Cancer Networks
3	Research On Drug Mining Method Based On Integrated Matrix Factorization Algorithm
4	Research And Implementation On Carcinogenic Gene Module Identification Method Of Glioblastoma Based On Integrating Multi-Omics Data
5	Research On Drug Response Prediction Algorithm Based On Matrix Factorization
6	Research On Mining Of Gene Driven Patterns
7	Non-negative Matrix Factorization Algorithm To Deal With The Cancer Gene Expression Data
8	Feature Extraction Of Cancer Gene Expression Data Based On Non-negative Matrix Factorization
9	Computational Models Of Anticancer Drug Response Prediction Based On High-throughput Sequencing
10	Research On Drug Recommendation Algorithm