Font Size: a A A

Research On Identifying Subtype-specific Driver Genes Based On Multivariate Data Integration

Posted on:2019-02-03Degree:MasterType:Thesis
Country:ChinaCandidate:X LiFull Text:PDF
GTID:2370330545469672Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Breast cancer is one of the most common malignant tumors in women,which is also known as a heterogeneous disease.The identification of subtype-specific driver genes is critical to guide the diagnosis,assessment of prognosis and treatment of breast cancer.With the development of high throughput sequencing technology,many large-scale genomics projects have accumulated a large number of data for various types of cancer,such as genomics,transcripology and proteomics,which provides an unprecedented opportunity for a comprehensive interpretation of the molecular mechanisms of breast cancer development.A great challenge in exploring cancer progression is to identify the driver genes from the variant genes by analyzing and integrating multi-types genomics data.At present,there is no robust model to extract the relevant driver genes and driver pathways of breast cancer subtypes from these highly heterogeneous and strongly related data.Thus,two methods of integrating genomic data are proposed,which explore the molecular mechanisms related to prognosis of patients from molecular subtypes.The the main innovations and research achievements is as follows:(1)An integrative method based on module network is proposed to identify subtype specific driver genes.Firstly,the data subset of the differential expression gene is selected by EMD analysis for highly heterogeneity breast cancer data,and the candidate modulator genes are selected utilizing the frequency method.Then,for each subtype,the initial modules is constructed by clustering method and the heterogeneous network is built through module network learning.The regulatory mechanism of multi-omics is established by the regression tree model.Thus,The final subtype-specific driver genes were acquired by paired-wise comparison in subtypes.To validate specificity of the driver genes,the gene expression data of these genes were applied to classify the patient samples with 10-fold cross validation and the enrichment analysis were also conducted on the identified driver genes.The experimental results show that the proposed integrative method can identify the potential driver genes and the classifier with these genes acquired better performance than with genes identified by other methods.(2)A new model based on pathway network is proposed to identify driver pathways and driver genes.Driver mutations of subtypes are found by integrating the copy number variation,gene interaction network and homologous information data.Firstly,the initial weight of genes is calculated according to the frequency of genes in all pathways and the degree of genes in pathway network.Then the score of the point and edge in the pathway network is calculated using the copy number variation and the homologous information,and the score of the whole pathway network is obtained.The important driving pathways and genes is selected by sorting the score.To validate specificity of the selected driver genes,classification verification between subtypes is utilized.Pathway activity analysis and enrichment analysis are used to verify the potential biological significance of driver mutations.The experimental results show that the integrative method is effective and has reference value for the treatment of breast cancer subtypes,and is of great significance for understanding the pathogenesis of breast cancer.
Keywords/Search Tags:driver genes, data integration, module network, pathway network
PDF Full Text Request
Related items