Font Size: a A A

Research On Methods Of The Prediction Of DNA Protein Binding Sites Base On DNase Signal

Posted on:2016-03-16Degree:MasterType:Thesis
Country:ChinaCandidate:D Y LianFull Text:PDF
GTID:2310330542474001Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
There are many functional proteins in human cells.They bind to genes and act with them.To identify the binding sites of these proteins accurately is an important and difficult research in life sciences.ChIP-Seq(Chromatin Immunoprecipitation,ChIP)is also known as the binding site analysis,it has been very mature in this field,but it also exist deficiencies.First,the protein enzyme has specificity,so it can lead to some protein detect experiments failure because of no suitable specific enzyme;Second,an experiment can only detect one kind of protein.Time-consuming,high cost make it cannot be used on a large scale;Third,more importantly,as the DNA fragments we obtain are long,we can only get the ends of the fragment,only a part of sequence,so they are not the precise binding sites.Therefore,although the resolution of the sequencing data can reach single base level,the positioning resolution of purpose protein binding sites can reach dozens of bases at most.In order to make up for the defect of ChIP-Seq,we adopted a new DNA-protein binding site detection technology--the DNase-Seq technology.DNase-Seq technology has many advantages compared with ChIP–Seq.First,because DNase-Seq doesn't have protein specificity,it can test all proteins' binding sites of the whole genome for one-time,this greatly improve the test efficiency and reduce the test cost,made it possible to test DNA protein binding sites on a large scale;Second,more importantly,because the starting position of DNase-Seq sequencing is the position of enzyme cutting,therefore,the DNA protein binding sites can reach single-base detection resolution.In the process of this research,we first use GEM with ChIP-Seq high throughput sequencing data,to get the determined protein binding sites,and next,we extract the base sequence at the binding sites and the DNase-Seq sequencing data as our training data to train our prediction model of protein binding sites.Finally,use the trained model to predict protein binding sites within the open area of the genome.The experimental results show that our data preprocessing method can effectively highlight the DNA protein binding site identification information,achieve the intended purpose.The protein prediction model combined well with bias of the sequence and DNase-Seq information,and works well in protein binding sites prediction experiments,proves the effectiveness of our method.
Keywords/Search Tags:DNA-protein binding sites, DNase-Seq, ChIP-Seq, Prediction model
PDF Full Text Request
Related items