Development And Application Of DNA Methylation Data Analysis Software

Posted on:2020-10-27Degree:DoctorType:Dissertation
Country:ChinaCandidate:Q W ZhouFull Text:PDF
Epigenetic modification plays a key role in cell differentiation,development,and in some diseases such as cancer.DNA methylation is one of the important epigenetic modifications.It has reported that DNA methylation plays a very important role in plant immunity,cardiovascular disease,atherosclerosis,autism,neurodegenerative diseases,cancer treatment,kidney aging and so on.Studying the genome-wide DNA methylation profile is of great significance for the analysis of the distribution and mechanism of DNA methylation and the regulatory relationship between DNA methylation and related diseases.This thesis illustrates the following work,which has been done on the whole genome DNA methylation data: 1.An integrated package for bisulfite DNA methylation data analysis with Indel-sensitive mapping.2.SNP calling tool in Bisulfite-Seq data.3.DNA methylation haplotype assembly and allele-specific DNA methylation region detection tools.4.Applied DNA methylation analysis pipeline tool on real BS-Seq data.At present,the most important aspects for DNA methylation research include DNA methylation detection technology and DNA methylation data analysis tools.The Whole Genome Bisulfite Sequencing technology is the current mainstream sequencing technology for detecting genome-wide DNA methylation in single-site levels.Therefore,an easy-to-use DNA methylation data analysis pipeline tool for whole-genome bisulfite sequencing technology is very important.The main analysis about DNA methylation data includes DNA methylation data preprocessing,data alignment,calculation of the DNA methylation levels of individual loci,genomic regions or functional regions such as genes/transposable elements,DNA methylation visualization and differential analysis of DNA methylation.We developed the corresponding tools for the above several DNA methylation data analysis modules,which are described in Chapter 2.The BatMeth2 can generate an intuitive HTML report file based on the analysis results to facilitate viewing of the data results.The unmethylated cytosine sites of DNA methylation data are converted to thymine after bisulfite treatment,which will greatly increase the difficulty of Single Nucleotide Polymorphism detection in DNA methylation data.In Chapter 3,we introduced the software BSsnpcall which is developed to detect SNP(Single Nucleotide Polymorphism)in bisulfite sequencing data.The number of SNP results detected by BSsnpcall is 10% higher than that of common tools,and the accuracy is 5%-25% higher than that of other tools.In addition,the running time required is only 1/2 of one software and 1/70 of the other software respectively.The allele is a pair of genes at the same position on the chromatid that control the same trait.Allele-specific gene expression is not only common on X chromosomes,such as genetic imprints but also is common in autosomes and plays a key role in common diseases.DNA methylation is a key factor in the differential expression of alleles.As described in Chapter 4,we developed a tool,MethyHaplo,for haplotype assembly combined DNA methylation and SNPs and allele-specific DNA methylation detection.The results showed that: 1)The haplotype assembly length completed by combining DNA methylation information and SNP information is ~10 times longer than the haplotype assembly length only by SNP information.2)Our software's result is reliable through comparing the allele-specific DNA methylation gene detected by our software with the allele-specific expression genes and the known imprinting genes.3)Allele-specific DNA methylation is mainly distributed in the exon region within the genome-wide range and is significantly enriched at the gene transcription start region.4)Allelespecific DNA methylation is mainly enriched in highly expressed genes and has a relatively higher distribution in histone modification regions associated with transcriptional activation.5)Allele-specific DNA methylation is significantly enriched in the allele-specific CTCF region.And the analysis results showed that there is a negative correlation between the allele specific DNA methylation and allele-specific CTCF.The application of the DNA methylation pipeline package in rice BS-Seq data is described in Chapter 5.In this section,changes of the DNA methylation in the mutant materials of several DNA methylases in rice were analyzed using BatMeth2.The results showed that: 1)The methylation of CG and CHG DNA in rice is mainly distributed in heterochromatin,while the methylation of CHH is more distributed in euchromatin.2)DNA methylase OsDRM2 is mainly responsible for CHH DNA methylation in rice.3)The chromatin allosteric factor OsDDM1a/1b is mainly responsible for CG and CHG DNA methylation in rice.4)OsDRM2 regulates the expression of related genes by regulating CHH DNA methylation on Miniature Inverted-repeat Transposable Elements.5)DNA methylase OsDDM2 and OsDDM1 can coordinate to regulate CHH DNA methylation in rice.According to the actual requirements of DNA methylation data analysis,this thesis developed a software of DNA methylation data analysis process,a SNP detection tool in DNA methylation data and a haplotype assembly tool.Finally,the BatMeth2 developed in this thesis was applied into rice data analysis.
Keywords/Search Tags:DNA methylation, whole genome bisulfite sequencing, DNA methylation analysis pipeline, DNA methylation haplotype assembly, allele-specific DNA methylation, Single Nucleotide Polymorphism
