Font Size: a A A

Development And Application Of Software For Haplotype-Resolved Hi-C

Posted on:2020-01-13Degree:MasterType:Thesis
Country:ChinaCandidate:H LuoFull Text:PDF
GTID:2370330572984760Subject:Bioinformatics
Abstract/Summary:PDF Full Text Request
The studies based on chromosome conformation capture technique and its derivatives(such as Hi-C)have shown that the chromatin three-dimensional(highorder)structures play important roles in transcriptional regulation,DNA replication,early embryonic development,and disease development.In recent years,researchers used the genetic variants to study the characteristics and functions of chromatin-highorder-structure haplotypes.The results showed that the chromatid high-order structures derived from mother and father were significantly different in the specific chromatin regions or development stages,which means that the chromatin high-order structures exhibit the allelic specificity.On the other hand,the allelic specificity of chromatin high-order structures was related to the allelic effect of gene expression,suggesting that the haplotype-resolved chromatin high-order structures may regulate the allelic-specific gene expression.In this dissertation,we use the term,haplotype-resolved Hi-C,to represent the work of assigning Hi-C data into paternal or maternal chromatin interactions by using phased genetic variants.However,the genetic variants are insufficient and unevenly distributed on the genome,which affects the utilization efficiency and leads to systematic bias of haplotype-resolved Hi-C data.Therefore,optimizing the haplotype-resolved Hi-C pipeline is important for improving the precision and accuracy of the haplotype-resolved chromatin high-order structures.In this thesis,a novel software,called HiCHap,was developed for analyzing the haplotype-resolved chromatin high-order structures based on Hi-C data.The main functions include using all sequence information of Hi-C reads to improve the haplotype data utilization,using two-step correction strategy to correct the bias caused by the unevenly distributed genetic variants and Hi-C experiments,and adjusting the traditional methods to obtain the haplotype-resolved chromatin high-order structures.We also used HiCHap to study the association between the allelic specificity of chromatin loops and the allelic specificity of the key transcription factors.First,all heterozygous single nucleotide polymorphisms(SNPs)on the Hi-C reads are used by HiCHap to improve the data utilization.Traditional Hi-C pipeline usually spans the junction site or uses iteratively mapping.In this process,part of sequence is discarded.In HiCHap,all valid DNA sequences in reads are mapped to genome to make full use of all sequence information on Hi-C reads to improve data utilization.Secondly,HiCHap utilizes a two-step correction strategy to correct the systematic biases caused by the SNPs density distribution and Hi-C experiments.Meanwhile,the haplotyperesolved interaction matrix will be constructed.Due to the complex relationship between genetic variant distribution and data utilization of haplotypes,HiCHap uses the raw data to indirectly measure the influence of SNPs and utilizes asymmetric matrix to correct the bias caused by genetic variant distribution in the first step.Next,the matrix will be symmetrized and the bias from Hi-C experiment will be corrected by using the matrix balance algorithm in the second step.Compared with the traditional method,the correction algorithm in this thesis performs better in many ways.Then,taking the chromatin loops as an example,the haplotype-resolved chromatin high-order structures are identified by adjusting the traditional algorithms.Based on the integration of the haplotype-resolved chromatin loops and the traditional chromatin loops,the binomial distribution is used to test the significance of the difference between the paternal loop and maternal loop.The allelic-specific chromatin loops are next identified and screened.Finally,this thesis analyzes the association between the allelic-specific loops and the allelic effect of key transcription factor binding sites(such as CTCF and Cohesin).The results show that the allelic-specific transcription factor binding sites often appear on the anchor of allelic-specific chromatin loops.Moreover,the correlation between allelic effect of chromatin loops and allelic effect of transcription factor binding sites is positive,suggesting that the allelic effect of key transcription factor binding sites may be one of the important factors for the formation of allelicspecific chromatin loops.
Keywords/Search Tags:haplotype, Hi-C, software development, chromatin loop, allelic specificity
PDF Full Text Request
Related items