Font Size: a A A

Analysis Of DNA Protein Binding Sites Based On DNase High Throughput Sequencing Information

Posted on:2017-01-16Degree:MasterType:Thesis
Country:ChinaCandidate:P C SangFull Text:PDF
GTID:2310330518487920Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Protein is an essential material in biological activity, and some proteins are combined with DNA, which plays a decisive role in the expression and regulation of genes. In order to make further analysis of these proteins, it is necessary to find and identify the DNA protein binding sites, which is also the key and difficult point in this study. In the past few years, the main identification method of DNA protein binding site was ChIP-Seq, that was the combination of Chromatin Immunoprecipitation and the High-throughput sequencing.However, due to the ChIP-Seq are high energy and low accuracy, so many kinds of defects can not be overcome. DNase-Seq, which is based on the DNase high through put sequencing information to identify DNA protein binding site has gradually been applied. Since the experimental principle is not specific, the DNase-Seq can overcome all the shortcomings of ChIP-Seq in theory, it has become the first choice for the study of DNA binding sites. This paper mainly describes how to use the DNase-Seq to identify DNA binding sites.In the study, we need to obtain the data of the experiment in the open area firstly, this stage is mainly based on ChIP-Seq. We use GEM to obtain a series of binding sites of one protein. After extracting DNase high throughput sequencing information from these binding sites, that is, DNase-Seq data. The training data will be formed after the operation of alignment, filtration and removing the interfering signal. We use the training data to extract features, and to construct a recognition algorithm based on DNase data. Finally, we combine the algorithm of DNase data with the algorithm of Seq data, and get the recognition model based on DNase-Seq information.In order to verify the prediction model,we used the area under the ROC curve to determine the classification effect. We mainly verified the DNase data model, Seq data model,as well as the DNase-Seq data model. The final result shows that the classification results of the model based on DNase data can be very good, which is a breakthrough in the research of DNase data. By comparing the results of the other two kinds of data, it shows that the combination of the model of the traditional Seq data and the model of the DNase data presented in this study can greatly improve the classification results.This proves the final prediction model based on DNase-Seq data is accurate and reliable.
Keywords/Search Tags:DNA-protein binding sites, DNase-Seq, ChIP-Seq, Prediction model
PDF Full Text Request
Related items