Font Size: a A A

Functional Patterns Mining And Analyzing Based On Epigenetic Data

Posted on:2017-01-17Degree:DoctorType:Dissertation
Country:ChinaCandidate:X F YangFull Text:PDF
GTID:1368330542992893Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Bioinformatics is an interdisciplinary field of mathematics,computer science and biology that aims to solve the problems in biology and explore the mystery of life by theories and methods in mathematics and computer science.With the development of next generation sequencing technology,massive amount of biological data have been generated.These data provide plentiful of basic data for mining the functional patterns and analyzing the biological implications of different biological molecular.Epigenetics,which is one of the most important component of functional genomics,is the study of heritable changes in gene expression without changing DNA sequence.DNA methylation,noncoding RNA and histone modification are there key epigenetic factors,and have been proved to play very important roles in many biological process,such as cell development,differentiation and complex diseases(including diverse cancers).Therefore,mining and analyzing the functional patterns of different epigenetic factors in different cell types or samples are significant for understanding the cell development process and mechanisms of complex diseases and promoting the development of disease diagnosis and therapy.Based on the available data of epigenetic factors,this dissertation investigates the methods for mining and analyzing the functional patterns of DNA methylation and long noncoding RNA(lnc RNA)across different cell types of samples,including mining and analyzing common and specific DNA methylation patterns within and across multiple cell lines,investigating pan-cancer and cancer specific DNA methylation patterns,exploring the patterns of lnc RNAs that implicated in human diseases and predicting the potential disease associated lnc RNAs.Specifically,we investigate the following problems and make the corresponding contributions.1.Mining and analyzing the common and specific DNA methylation patterns within and across 54 different cell lines based on the DNA methylation data in ENCODE.Firstly,a region called local cluster of Cp G sites(LCCS)is defined and detected via a greedy algorithm,and 35276 LCCSs are identified across whole genome.Then,a LCCS co-methylation network is constructed to investigate the common DNA methylation patterns across all cell lines.Seven co-methylation modules are detected that reveal two distinct groups in terms of their methylation level and genomic characteristics.Furthermore,cell lineage-specific high-and low-methylation patterns are mined,which are depleted in promoter,Cp G island(CGI)and repeat regions but enriched in gene body and non-CGI regions,especially the CGI shore regions.In addition,cell lineage specific low-methylated LCCSs are enriched with functional transcriptional factor binding motifs.Moreover,the detected cell line-specific high-and low-methylated patterns show distinct enrichments in cell line-specific chromatin states and functional relevance with the corresponding cell lines.2.Exploring pan-cancer and cancer specific DNA methylation patterns among 5480 DNA methylation profiles of 15 cancer types from TCGA.Firstly,the differentially methylated Cp G sites(DMCs)in each cancer are detected and pan-cancer DMC(PDMC)and cancer specific DMC(cs DMC)are defined.Then,5450 hyper-and 4433 hypomethylated PDMCs are identified.Intriguingly,three adjacent hypermethylated PDMC constitute an enhancer region,which potentially regulates two tumor suppressor genes BVES and PRDM1 negatively.Moreover,six distinct motif clusters,which are enriched in hyper-or hypomethylated PDMCs,are detected and they are associated with several well-known cancer hallmarks.Eight distinct pan-cancer wide methylation-expression gene groups are identified and they are enriched in cancer associated pathways.Additionally,55 hypermethylated and 7 hypomethylated PDMCs are significantly associated with patient survival,and can be considered as features for dividing patients into high and low risk groups.Lastly,cancer-specific DMCs are determined and they are enriched in known cancer genes,cell-type-specific methylation marks and super-enhancers,indicating their underlying implications to individual cancers.3.Based on the available lnc RNA-disease associations,a lnc RNA-disease association network is constructed and two relevant networks,lnc RNA-implicated disease network and disease-associated lnc RNA network,are derived.The topological properties of these networks are analyzed and reveal that they are biological significance.Moreover,clustering analysis indicates that diseases in the same class are more likely to cluster together,which reveals diseases in the same class are associated with the same lnc RNAs.Furthermore,a coding-noncoding gene-disease bipartite network is constructed and an algorithm is proposed to uncover the hidden lnc RNA-disease associations in this network.The algorithm is evaluated by leave-one-out cross validation and an AUC of 0.7881 is achieved,reveals the powerful of predicting potential lnc RNA-disease associations.Finally, the algorithm successfully predicted 768 potential lnc RNA-disease associations between 66 lnc RNAs and 193 diseases.Furthermore,the results for Alzheimer’s disease,pancreatic cancer,and gastric cancer are verified by other independent studies,indicating that the predicting results are very valuable for further investigation.
Keywords/Search Tags:Epigenetics, DNA methylation, long noncoding RNA, complex disease, functional pattern
PDF Full Text Request
Related items