Font Size: a A A

Genome-Wide DNA Methylation Patterns And Its Application To Complex Diseases

Posted on:2017-03-30Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y Y ZhangFull Text:PDF
GTID:1224330488457228Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of biological science and computer technologies, human beings desire to understand the pathogenesis mechanism of complex disease. The rise of epigenetics provides an opportunity to further understand the mechanism of diseases. DNA methylation, as an important epigenetic modification, plays an essential role in gene transcription, cell differentiation, aging and development of tumor. Identification of differentially methylated patterns between cases and controls, analysis of disease-related biomarkers through integrating DNA methylation and gene expression data, and systematic study of the relationship between DNA methylation and gene expression in diseases are meaningful to help us understand the mechanism of diseases. This dissertation studies differentially methylated patterns and its application to complex diseases.The main contribution of the dissertation is outlined as below.1. A new measure, QDML, based on relative entropy is developed to identify differentially methylated loci(DML). It has a higher accuracy than that of other methods in identifying DML.QDML, based on relative entropy, is proposed to identify DML according to the two characteristics, non-normal distribution and high heterogeneity, of DNA methylation data. This measure, compared with some statistical methods, does not require a presupposed distribution of methylation data and can quantize the difference of DML. Also, according to the sign of the measure, hypermethylated and hypomethylated loci can be identified. Through carrying on the theoretical derivation of the measure, we show its effectiveness in handling high heterogeneity data. Finally, simulation studies and real data application show that the measure has a higher accuracy and a lower false positive rate in identifying DML than some statistic methods.2. A method based on distance discriminant analysis is proposed to identify differentially methylated regions(DMRs). It has higher sensitivity and specificity in identify DMRs than bumphunting and Ong’s methods.In the procedure of identifying DMRs, we do not need to cluster methylation sites or partition the genome in advance, and only estimate the ability of one region for discriminating cases and controls. Therefore, the identified DMRs have no limit in size and are more possible larger region than that of other methods. Through a comparison of our approach with Bumphunting and Ong’s methods in simulation data, it is shown that our method has higher sensitivity and specificity and is more powerful in identifying DMRs which have a larger distance in the genome, or only consist of a few sites. Also, our method is more robust to heterogeneity of data. Applied to different real datasets, and integrated gene expression data, we identify the possible functional DMRs. Most of them are hypermethylated and locate at Cp G rich regions. The results are consistent with the fact that the methylation levels of Cp G islands are higher in tumors than normals. Through analyzing functional DMRs of genes in different diseases, we find the possible pathogenic mechanism of different diseases.3. A weighted network-based method is developed to identify disease-related genes and gene modules. In calculating the weights of network, DNA methylation and gene expression data are integrated. We identify some disease-related gene and gene modules.Protein-protein interaction(PPI) is used as a prior gene network. Through integrating DNA methylation and gene expression, we recognize gene expression values and methylation levels of all Cp G sites in genes as features of genes and calculate the edge weight of gene network based on genes’ features using principal component analysis(PCA) and canonical correlation analysis(CCA) for case and control samples respectively. Two weighted gene networks are constructed. Comparing the structural features of these two weighted networks, we identify genes with significant difference as disease-related genes. Then, considering the subnetwork constructed by these genes and their neighbors, disease-related gene modules are identified. In the procedure of calculating edge weights of networks, we consider all methylation sites rather than their average values as gene features. Therefore, all methylation information can be retained. Applied to real data--breast cancer data, we identify many genes which are known as breast cancer-related genes and some new possible breast cancer-related genes and gene modules.4. We propose a computational method based on differential analysis to study the relationship between DNA methylation and gene expression. It provides some scientific basis to understand epigenetic regulation of complex diseases.We study the relationship between DNA methylation and gene expression in seven cancer to explain the effect of DNA methylation on gene expression. In this procedure, we not only calculate the differences of gene expression and DNA methylation in different gene regions, but also analyze the relationship between gene expression and DNA methylation. Applied to seven real datasets, we find the regulating patterns of methylation in different gene regions to gene expression are different. There are not only negative relationships, but also some positive relationships. Also, the region, gene body, has higher frequency than other regions in affecting on the change of gene expression. It may explain the mechanism of cancer. In addition to, we find that the regions with largest difference of DNA methylation of cancer-related genes are always TSS1500, gene body and 3’UTR. It is shown that these three regions may be the most likely regions related to cancer.
Keywords/Search Tags:DNA methylation, Differentially methylated loci, Differentially methylated regions, gene network, complex disease
PDF Full Text Request
Related items