Font Size: a A A

Study On Chromosome Three-dimensional Reconstruction And Co-regulation Areas Based On Hi-C Data

Posted on:2020-03-12Degree:MasterType:Thesis
Country:ChinaCandidate:J MuFull Text:PDF
GTID:2370330590954224Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
Three-dimensional reconstruction of chromosomes is a research hotspot in epigenetics and genomics in recent years.One of the most important methods is to use the sequencing technology to obtain the spatial structure information of chromosomes,that is,to predict the three-dimensional shape in the nucleus based on the two-dimensional contact frequency data of chromosomes.More and more studies have shown that the three-dimensional structure of chromosomes has an important impact on cellular processes such as DNA transcription,replication and modification.Using 3D reconstruction information to reveal chromosomal spatial interaction networks and gene co-regulation regions helps us understand the complex behavior of genomes from different levels and dimensions.With the emergence of 3C-like chromosome conformation capture technology,high-throughput Hi-C sequencing technology has the ability to efficiently and accurately detect chromosomal spatial contact in the whole genome,making it possible to construct a genome-wide three-dimensional structure.This paper has completed the following three parts of work.First,systematically studied the two most representative three-dimensional structure models of chromosomes: ShRec3 D algorithm and maximum likelihood algorithm.The advantages and disadvantages of the two are compared comprehensively,and the strengths and weaknesses are avoided.The theoretical pavement of the variable step size adaptive maximum likelihood algorithm is proposed.Second,according to the statistical characteristics of the yeast-like Hi-C data,16 chromosome contact matrix distribution models were established.On this basis,the variable step size adaptive maximum likelihood algorithm was introduced to realize the three-dimensional reconstruction of yeast chromosome,which provided a visualization basis for analyzing the co-regulation region related to chromosome structure.Based on the three-dimensional reconstruction model,this paper analyzes the chromosome structure and gene properties of specific co-regulatory regions of yeast chromosomes.The specific experimental contents and conclusions of this paper are as follows:In the process of three-dimensional reconstruction of yeast Hi-C data by ShRec3 D and maximum likelihood algorithm respectively:(1)ShRec3D algorithm uses the distance conversion function when reconstructing the three-dimensional structure of chromosomes.Fixed conversion parameters lack adaptiveness in practical applications;(2)Maximum likelihood algorithm uses the same objective function and fixed learning rate in iteration,which has obvious limitations when dealing with different chromosome data.Here,the variable step size adaptive maximum likelihood algorithm is optimized for the shortcomings of ShRec3 D algorithm and maximum likelihood algorithm.The variable step size adaptive maximum likelihood algorithm calculates the maximum distance Spearman correlation coefficient according to different chromosome Hi-C data when reconstructing the three-dimensional structure of chromosomes,and automatically obtains the optimal conversion parameters in the distance conversion function.The distance matrix obtained by the distance conversion function is optimized by the shortest path algorithm to satisfy the actual geometric space distance constraint to reduce the error.Aiming at the defect of the original maximum likelihood algorithm using a single objective function,in order to improve the adaptability of the algorithm,this paper proposes a method to fit the distribution characteristics of Hi-C data,and finally chooses the independent and identical distribution with Gaussian as the kernel function.The model has the advantages of high goodness of fit and small error.In the process of iterative optimization of objective function,this paper introduces a learning rate adaptive strategy in the gradient ascent algorithm,which further improves the adaptability and accuracy of the model,and obtains the three-dimensional structure of chromosomes through three important indicators: DSCC,DPCC and DRMSE.to evaluate.Finally,this paper introduces the practical application of chromosome three-dimensional structure model in the study of gene co-regulatory domain.By locating some co-regulatory domains in the folding region of chromosomes,the nucleosome occupation rate and protein replacement level at the initiation site of gene transcription and downstream 1000 bp in the co-regulatory domain were analyzed in this paper.Histone modification level and distribution characteristics of polymerase Pol II.
Keywords/Search Tags:chromosome three-dimensional structure reconstruction, sequencing technique, Hi-C data, co-regulatory area, gene analysis
PDF Full Text Request
Related items