Research On Identifying Driver Genes Based On Multi-omics Data Integration

Posted on:2020-12-12

Degree:Master

Type:Thesis

Country:China

Candidate:Q M Miao

Full Text:PDF

GTID:2404330620951108

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Cancer is a complex disease that poses a serious threat to human life.Breakthroughs in high-throughput sequencing technology have reduced research costs in cancer diagnosis,clinical treatment,and prognosis prediction.Integrating multi-omics and high-throughput data makes it possible for systematic and comprehensive analysis of cancer research,and the process of cancer generation is also studied in a deeper and more complete way.From the perspective of genetics,the generation of cancer is the result of continuous selection and accumulation of genetic mutations.Therefore,the integration of multi-omics data to mine cancer-related genes and cancer driver genes has become a hot spot for studying cancer pathogenesis.This paper proposes two methods to identify driver genes,and the main work includes the following two points:(1)A method based on overlapping community detection(GCommunity)is proposed to mine gene communities with overlapping characteristics and identify driver genes related to cancer.Firstly,EMDomics is used to analyze the differential expression of cancer data with high heterogeneity,and the genes with significant differential expression are selected as the input genes.Then,Gibbs sampler is used to construct gene interaction network for gene expression data,and protein-protein interaction(PPI)data is added to complete the information of gene interaction network.The overlapping community detection algorithm is used to mine the final gene community.The candidate driver genes of cancer are selected by frequency calculation of copy number variance.Then the regression tree model is applied to establish a regulatory mechanism between the candidate driver gene and gene communities to obtain the cancer driver genes.GCommunity method obtains the interaction relationship between genes from genomes and proteomes data,analyzes the mutation behavior of genes from copy number variation data,and establishes the regulatory relationship between mutant genes and gene communities with the probability statistical model.The experimental results show that GCommunity can mine the high-quality gene communities with biological significance,and the identified driver genes have driving significance.(2)A somatic mutation-based cancer driver gene detection method(MaxSIF)is proposed,which integrates gene expression data,protein-protein interaction data,and somatic mutation data.Firstly,the method uses the correction factor to remove the background noise of the silent mutation.Then the mutation score of the nucleotide is calculated by the proportion of nonsense mutations,missense mutations,frame-shift indels and in-frame-del in the nucleotide sequence.The gene interaction network composed of the expression data and the protein-protein interaction data is combined with the mutation score.The mutation influence score of the gene and the neighbor node is calculated,and the mutation influence score of the gene is represented by the maximum value.Finally,the gene with the high SIF is selected as the driver gene.The motivation for the MaxSIF approach is that if two genes both have a high mutation score and are close to each other in the gene network,they should have strong mutation effects.The method takes into account the mutational effects of all neighbors in the gene network to calculate the maximum mutational effects of the gene.The experimental results show that the driver gene recognized by MaxSIF method can be significantly enriched in the cancer pathway,which can correctly identify the driver gene and distinguish the oncogene and tumor suppressor genes.

Keywords/Search Tags:

driver gene, data integration, overlapping community detection, somatic mutation, mutation impact

PDF Full Text Request

Related items

1	Study Of Methods Of Driver Mutation Clusters Identification Based On Analysis Of Multiple Cancer
2	Study Of The Method Of Mining The Patterns Of Driver Mutation In Pan-cancer
3	Research On Screening Method Of Cancer Driver Gene Sets Based On High-throughput Sequencing Data
4	A Tumor Somatic Mutation Detection Method Based On Next-generation Sequencing Data
5	The Landscape Of Somatic Mutations And The Functions Of Major Mutant Genes In Acral Melanoma
6	Identification Of Melanoma Driver Mutation And Construction Of Related Information Platform
7	The Research Of Approaches For Identifying Mutated Driver Pathways In Cancer
8	Study On The Detection Of Driver Gene Mutation In Sputum Supernatant Of Patients With Non-small Cell Lung Cancer
9	Correlation Analysis Of The Distribution And Clinical Characteristics Of Driver And Non-Driver Genes In Patients With Myeloproliferative Neoplasms
10	Study On The Ensemble Learning Method For Uncovering Cancer Driver Genes