Font Size: a A A

Prediction Of Protein - ATP Binding Sites Based On Support Vector Regression Integration

Posted on:2016-02-24Degree:MasterType:Thesis
Country:ChinaCandidate:J H YuFull Text:PDF
GTID:2270330461982969Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
As the physical basis and undertaker of life phenomenon, protein plays an important role of life process. Bioinformatics research with the wider significance helped people to deepen the cognition process of life, improve medical environment and enhance the quality of life. It has received the extensive attention of scholars both at home and abroad.The basic unit of protein is amino acids. Theses amino acids sequences link together to form peptide chain, which folded and pleated in the space to form a specific three-dimensional structure. Differences of amino acids sequences determines the diversity of protein structure. As we all know, Protein function depends on protein structure. In terms of protein biology function, it is embodied in the combination that between protein and various ligands in some biochemical way (it refers to binding). The Ligands can be Adenosine 5’-triphosphate (ATP), vitamin, metal ion, drug molecule. The binding plays the corresponding role in the process of life. ATP is an important molecule in cell biology which plays an important role in membrane transport, cellular motility, muscle contraction, signaling, replicationand transcription of DNA, and various metabolic processes, the binding residues happened between protein and ATP has close relationship with the structure of the protein.With the Human Genome Project (HGP) in the 1990s, the number of known protein sequences showed the explosive growth, which marked the human has entered into post genome era. The rapid development of protein sequencing technology has accumulated a lot of uncalibrated protein sequence data. The traditional biology experiments were intensive, expensive and time-consuming. It encountered many problems. Now from known protein sequences to predict the binding relationship between protein and ligands is an important work. In this paper, an imbalanced binary classification problem does remain due to the fact that the sample size of binding residues (positive samples) is far less than that of non-binding class (negative samples) in protein-ATP binding residues prediction. Inspired by both the machine learning viewpoint that the classification problem can be regarded as a special case of the regression problem, and the characteristic of the bioinformatics problem concerned, in this paper a novel Protein-ATP binding residues prediction method is proposed, this method is support vector regression (SVR) ensemble model. The central idea can be presented as follows:firstly the sliding window is used to extract the features of every residue in the protein sequences, resulting in the imbalance binary class samples, secondly, the random under-sampling strategy is utilized in order to eliminate the significant imbalance between positive and negative samples, at last, based on an ensemble classifier constructed SVR prediction model and the corresponding suitable threshold, the Protein-ATP binding residues can be distinguished from the non-binding ones. As in the previous mentioned, machine learning methods viewpoint indicates the reletionship between regression and classification. Therefore, starting from the machine learning methods and the research of the characteristics of bioinformatics problem itself, the innovation of this study can be deemed as using the regression method to achieve the function of classification. Comparing with several state-of-art related methods, several common classifiers and the effectiveness of the proposed method can be validated by the experimental results on the standard data set.
Keywords/Search Tags:Protein-ATP binding, Random Under-Sampling, SVR, Classifier Ensemble
PDF Full Text Request
Related items