Font Size: a A A

Research On Epigenomic Data Analysis Methods Based On Wavelet Transform

Posted on:2014-11-22Degree:MasterType:Thesis
Country:ChinaCandidate:Y TianFull Text:PDF
GTID:2254330398989955Subject:Drug Analysis
Abstract/Summary:PDF Full Text Request
As an emerging genetics sub-discipline, epigenetics attract increasingly attention from the scientific community in recent years. Epigenetics correspond to traditional genetics. Traditional genetics focus on the level of gene expression change which is caused by the alteration of gene sequence, such as the gene mutation, the gene loss. Epigenetics, however, study the level of gene expression change while gene sequence don’t alter, such as DNA methylation, RNA interference, tissue protein modification.Encyclopedia of DNA elements plan (ENCODE) is sponsored by the U.S. National Human Genome Research Institute (NHGRI).It is a major international cooperation projects in which a number of countries around the world participate. ENCODE obtains large amounts of data as a continuous form that exist in a variety of genomic interval, but the interaction between them is still largely unknown. So we eagerly need a calculation method that can quantificationally evaluate the interaction between ENCODE data of different types.At present, we lack effective methods that analyse epigenome, reasonable strategies that integrate genome and epigenome, analysis means that study effect of associating with disease.The wavelet transform decompose the epigenome signal into multi-scale and perform de-noising role. It also can observe the characteristics of data at different scales. The wavelet transform process the data of different scales and resolution, just as observing overall function through the big window (large-scale) signal and finding tiny features through small "window"(small-scale) signal. Figuratively speaking, the result of wavelet transform analysis can help us see not only the forests, but also the trees. As a classic math tools, wavelet can denoise the signal. When processing data by multi-scale, it can not only keep the original nature of the signal, but also show the characteristic of the signal at different scales. So it is appropriate for observing regulation mechanism of epigenome and functional domains structure of the chromosome at different scales.Our research base on wavelet transform and we put forward a new kind of new epigenome analysis method. We can process epigenome signal at different scales, check the correlation of the epigenetic information, and re-identify the functional domains structure of the chromosome.We can make multi-scale analysis on the epigenome, and study the functional domains structure of the chromosome from histone modifications to interpret epigenome spectrum. The research result shows that the method is applicable to analyze the interaction of different experimental data types and identify human genome functional domains and functional element。Our research aim at multi-scale continuous high density epigenome data set and apply wavelet correlation analysis method (WCO) to study the correlation between the Epigenetic genome data sets. It make the visualization, quantitative and statistical analysis. Specific research is:(1)test wavelet correlation of Formal statistics, verify whether WCO method is properly applied to analysis of histone modification data.(2)describe(cell lines) histone modifications and (histone modifications) cell lines wavelet correlation model, evaluate how closely it is related with the histone modifications.(3)explore activation and inhibition of modifying function of wavelet, and identify the ENCODE pilot interval of bivalent domains.The original data we choose is44ENCODE pilot region, including14regions from500kbp size to2kbp size and30500kbp size regions. Our research mainly analyze wavelet correlation from4aspects:(1) test the wavelet correlation of9histone modifications in a single cell line at ENm004of ENCODE pilot region. At16kbp scale, we analyze wavelet correlation between every two histone modifications. At8kbp scale,32kbp scale and64kbp scale, we study wavelet correlation between H3K4me2and H3K4me3and analyze smooth distribution between every two histone modifications.(2) Initial observations be extended to other ENCODE pilot region, and the same strategies also be conducted on the other43ENCODE pilot regions. At16kbp scale, we analyze wavelet correlation distribution between H3K4me2and H3K4me3in GM06990cell line at44ENCODE pilot regions. At8kbp scale,32kbp scale and64kbp scale, we also study wavelet correlation distribution between H3K4me2and H3K4me3at44ENCODE pilot regions. At multi-scale, we analyse mean wavelet correlation between every two histone modification in GM06990cell line. The smooth distribution of the F statistics at specific region, smooth distribution of F gene density statistics and smooth distribution of the F statistic at conservative sequence are also statistically analyzed. At16kbp scale, we study the result of wavelet correlation which is related to wavelet correlation curve, multi-scale smoothing distribution and signal distribution between H3K4me2and H3K4me3in HeLa-S3cell line at ENm004of ENCODE pilot region. At same scale, we also analyze wavelet correlation distribution between H3K4methylation and histone acetylation in HeLa-S3cell line at44ENCODE pilot regions.(3) analyze H3K4methylation and histone acetylation in GM06990and HeLa-S3cell line. At16kbp scale, we study the result of wavelet correlation which is related to wavelet correlation curve, multi-scale smooth distribution and the signal distribution of16kbp scale between two H3K4me3in GM06990cell line and HeLa-S3cell line. And at16kbp, we test wavelet correlation distribution between H3K4methylation and histone acetylation in GM06990and HeLa-S3cell line at44ENCODE pilot regions.(4)identify the H3K4me3and H3K27me3signal common sites by detecting the significance level of5%under ENCODE pilot regions. We study wavelet correlation between H3K4me3and H3K27me3of bivalent domains. We test bivalent domains in the overlap areas modified by GM06990activation state and suppress state. After careful examination, we found that43bivalent domains in interval are far from upstream and downstream genes. It means that bivalent domains we analyze are likely to exist broadly in the human genome.Through the above analysis, the correlation of histones have similarity at different pilot region and different cell lines. Some histone modifications show high wavelet correlation at different cell lines. When analyzing the wavelet correlation of histone modifications at each ENCODE pilot region, we found that if two tags have same enrichments, they show high wavelet correlation. We found that the wavelet correlation is closely related to distribution patterns of the genome in multi-scale ENCODE pilot regions. Identified wavelet correlation pattern scan test some models that are used to explain function of histone modifications, such as histone code, signal network and the model of charge neutralization. In addition, the modifications in activation and inhibition which are analyzed wavelet correlation show that the above analysis method is applicable to re-identify bivalent domains, is widely applicable to explore the interaction relationship among different experimental data types and can identify functional elements and human genome functional domains.
Keywords/Search Tags:epigenome, ENCODE, histone modification, wavelet correlation
PDF Full Text Request
Related items