Font Size: a A A

Research On Hi-C Data Resolution Based On Generative Adversarial Network

Posted on:2021-09-01Degree:MasterType:Thesis
Country:ChinaCandidate:Q BaiFull Text:PDF
GTID:2480306230478124Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The genome is organized hierarchically in a three-dimensional(3D)space within the nucleus and displays multiple layers of functional complexity,including chromosomal regions,long base topological domains,and DNA loops in cis-regulatory elements.A comprehensive understanding of the relationship between genome structure and function is an important but extremely difficult technical challenge.Hi-C is data describing the three-dimensional structure of DNA,especially high-resolution Hi-C data helps to understand how the order of different levels of chromosomes guarantees the normal activity of cells.However,the low-resolution Hi-C data has little effective information(concentrated near the diagonal of the matrix),many noise data,and uncertain noise distribution.It is extremely difficult for the subsequent genomics research to produce.Improving the quality of Hi-C data,removing noise information,and determining the boundaries of messages become necessary conditions for subsequent bioinformatics research.Generating an adversarial network model can generate diverse data by learning complex sample distributions.The successful experience of generating adversarial networks provides a reference for restoring the original information in low-resolution Hi-C data.Based on this,this paper innovatively proposes two Hi-C data superresolution generation adversarial networks(Hi C-Densenet GAN and Hi C-CGAN).For the characteristics of Hi-C data with much noise and uncertain distribution,learning from low resolution to high End-to-end mapping of resolution.The Hi C-Densenet GAN generator network G uses a residual scaling factor RRDB structure to increase the depth of the generator network,extracting high-frequency and low-frequency feature information of the data while removing noise;adding a gradient penalty term to the network discriminator loss function to discriminate the network D stably converges during the training process,avoiding "mode collapse" to reach the optimal solution.Hi C-CGAN generates a confrontation network with a symmetric structure conditional on low-resolution Hi-C data.The data generated by generator G and the original highresolution data are spliced together into discriminator D,and discriminator D learns to generate The difference between the data and the target data makes the discriminator D more targeted for the supervision of the generator G during the training process.In the experiment,the model is mainly trained based on the GM12878 data set and verified on the K562 data set.During the training process,the input of the network is obtained by high-resolution data downsampling.The experimental results show that the model designed in this paper can remove Hi-C data Noise and improve data quality.And it has played a great role in mining the biological information value of Hi-C data.
Keywords/Search Tags:Adversarial
PDF Full Text Request
Related items