Font Size: a A A

Identification And Study Of DNA Protein Binding Sites Based On Dnase High Throughput Sequencing Information

Posted on:2019-03-29Degree:MasterType:Thesis
Country:ChinaCandidate:H LiuFull Text:PDF
GTID:2370330548487367Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Since the human genome-wide program,the study of transcription factors and their binding sites have been become an important topic for mankind to explore the essence of life.Transcription factors only have the function of regulating gene expression after binding to a specific binding site.Therefore,accurate identification of transcription factor binding site has become one of the core contents of genetics.In 2010,the DNase-seq technology proposed by Crawford could detect the protein binding sites within the whole genome by a single experiment.Compared with the widely used technology ChIP-seq and ATAC-seq,DNase-seq has many advantages,which greatly reduce the cost while improving the efficiency and accuracy of detection.Firstly,we could download the data of DNase-seq and transcription factors on the website in this research.Secondly,we could use the FIMO tool to obtain the exact transcription factor binding sites,after that,the DNase-seq data for each binding site was extracted and the data set was constructed.Subsequently,based on the DNase-seq data,we selected and constructed a recognition model of transcription factor binding sites based on the AutoEncoder neural network.Then,we will give a consideration from the aspects of the typically of the transcription factor binding sites and the depth of sequencing to analysis the results of the AutoEncoder neural network outputs,so as to determine the conditions required for using the recognition model.Finally,taking into account the imbalanced factors of the dataset's size,we use sensitivity,specificity and Matthew's correlation coefficient to evaluate the performance of the constructed prediction model.Not only measure the ability of recognition ability,but also verify the reliability of model.Through the evacuation of model prediction results,the designed model can effectively recognize the transcription factor binding sites based on DNase-seq data and lay the foundation for constructing a genome-wide transcription factor regulatory network.
Keywords/Search Tags:transcription factor binding sites, DNase-seq, the depth of sequencing, AutoEncoder
PDF Full Text Request
Related items