Font Size: a A A

A Comprehensive Feature Analysis Of Protein-Nucleic Acid Interactions And An Improved Prediction Protocol For DNA-binding Proteins And RNA-binding Proteins

Posted on:2014-01-26Degree:MasterType:Thesis
Country:ChinaCandidate:C X ZouFull Text:PDF
GTID:2230330395477418Subject:Pharmacy
Abstract/Summary:PDF Full Text Request
DNA binding proteins (DNA-BPs) and RNA binding proteins (RNA-BPs) are both nucleic acid binding proteins which are pivotal to the cell function, such as gene regulation and transcription, DNA replication and repair, DNA packaging recombination,chromatin and ribosome formation and other fundamental activities associated with human life.With the development of the post-genome era and implementation of large-scale genome sequencing projects, there are currently many databases of protein sequence and many crystal structures of protein-DNA/RNA complexs. The annotation of these enormous sequences has been the most important and demanding task within the field of bioinformatics which is the key step for father analysis. However, the protein-DNA/RNA recognition mechanisms is complicated and largely unknown at present, therefore, we investigated the existing large data using Bioinformatics methods which may provide useful insights for revealing the mechanisms implied in protein-nucleic acids interactions. In our work, the major task is to recognise which proteins can interact with DNA/RNA.In this work, the focus is how to transform the protein sequences into uniform numeric representation appropriately. Moreover, to develop good predictive models for DNA-BPs and RNA-BPs, machine learning method is adopted to do a comprehensive feature analysis and a systematic investigation of the combination of various descriptors from three main levels: global, nonlocal and local of protein sequence and their performances are exhaustively investigated. At last, in the case of five-fold cross-validation over the DNAdset, we obtained an overall accuracy of0.940and MCC of0.881. Then we employed the above method in the prediction of RNA-BPs and further compared the difference between DNA-BPs and RNA-BPs.We developed a newly sequence-based prediction method for DNA-BPs and RNA-BPs using SVM and comprehensive feature analysis in the present work. Such method can provide guidance for biological experiments in this area and is very helpful for further research of interaction mechanisms. The good results also suggest that it can efficiently develop an entirely sequence-based protocol that transforms and integrates informative features from different scales used by SVM to predict DNA-BPs and RNA-BPs accurately. Moreover, a novel systematic framework for the prediction of sequence descriptor-based protein function is proposed herein which provides a new ideafor achieving the ultimate goal of Bioinformatics.
Keywords/Search Tags:Computer-aided drug design, Bioinformatics, DNA binding proteins, RNAbinding proteins, Machine learning
PDF Full Text Request
Related items