Font Size: a A A

Prediction Of DNA-binding Sites With K-Spaced Amino Acid Composition

Posted on:2016-04-04Degree:MasterType:Thesis
Country:ChinaCandidate:Z P ZhangFull Text:PDF
GTID:2180330470950372Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The interactions of protein and DNA are closely related to the activity inbiological life. Then in21th, the completion of the human genome draft, the scientistsfound that the number of genes encoding proteins in the genome can only account forthe entire2%of the human genome,98%of the genome function is not yet clear.Therefore, we want to in-depth understanding of gene expression and regulation, wemust study the interaction of mechanism of nucleic acid molecules and protein.Through the tireless efforts of scientists’ research, they found DNA fragment not onlycan encode protein, also can be combined with a specific protein, and they played aneffect on regulation of gene activity. From the above, we can see the mechanism ofinteraction between these biological molecules is the basis of life processes, whichcan reveal the essence of life phenomena. Because the protein is the carrier ofbiological molecules, and DNA is the transmission of life, the interaction mechanismbetween protein and DNA play a key role in life activities such as DNA replication,recombination and so on. These activities are occurring in a specific protein involvedin the case, at the same time by the regulation of protein-DNA interactions, and theseproteins which are interacting with DNA are called DNA-binding proteins.With the rapid development of bioinformatics technology, the method combinedwith computer technology and mathematics principle to predict DNA-binding siteshas become a research hot spot. In this paper, we establish a model by usingbioinformatics methods, mathematical principle based on a large amount of data andhigh performance computing platform, we develop an efficient and accurate predictormodel to achieve a multiplier effect, reducing the traditional experiment time, thedisadvantages of high cost.In this paper, we mainly used a new bioinformatics tool named K-Spaced aminoacid pairs and support vector machine to predict DNA-binding sites, through thecomparison of two data set, PDNA62and PDNA224, we selected different values ofwindow length and K, at the same time by comparing the experiment with otherclassifiers can see that our method was very effective, and it would be a guidance forDNA-binding sites prediction.From the experimental results, the method using the K-Spaced amino acids pairsto predict protein and DNA binding sites, was effective, because it was based onprotein sequences, and not only considers the information of20kinds of aminoacids, but also retained the interaction information of the local amino acid. Thesupport vector machine was used as classifier, The performance of this tool is measured with an accuracy of78.38%, a sensitivity of76.86%, a specificity of79.86%and MCC of0.5691for PDNA62dataset; as well as an accuracy of87.07%, asensitivity of81.4%, a specificity of92.75%and MCC of0.7462for PDNA224dataset. Compared with the other8models, it was better than the other8models; thisdemonstrated the effectiveness of our method, with good guidance for futurebiological experiments.
Keywords/Search Tags:Protein and DNA interactions, binding site, K-Spaced amino acid pairs, SVM
PDF Full Text Request
Related items