Predicting Protein Protein Interactions And Its Active Sites Based On Data Mining Algorithm

Posted on:2012-02-07

Degree:Master

Type:Thesis

Country:China

Candidate:Y N Zhang

Full Text:PDF

GTID:2210330362959213

Subject:Pattern Recognition and Intelligent Systems

Abstract/Summary:

PDF Full Text Request

With the growing development of high-throughput sequencing technology, the amount of protein sequences appears a exponential growth trend. However, their function and interactions are still unknown. What becomes a pressing issue is analyzing protein characteristics and its interactions more quickly and annotating these active sites and their function effectively. On the other hand, the rapid development of computer technology has provided a solid substantial foundation for molecular biology research. Since the research related to protein-protein interactions and active sites face massive data, analysis of these data and further revelation of the natural laws behind these data have become a cutting-edge research field in proteomics and computational biology based on data mining theory.In the last decades, a great many of data mining methods for analyzing protein sequences have been proposed, which have been receiving lasting concern from researchers. In this dissertation, we have developed more effective data mining methods for accurately predicting protein protein interactions and its active sites. Furthermore, we have provided corresponding independent algorithm package and online websites related to our innovative algorithms. The work and novelties in this dissertation include:Propose a novel approach to predict protein protein interactions based on compressive sampling algorithm. Firstly we extract distinctive features from protein sequences. Then the original high-dimensional protein sequential feature vector is compressed into a much lower but more condensed space taking the sparsity property of the original signal into account. We have also compared the compressive sampling method with other traditional dimension reduction method and demonstrated the efficiency of this method. Then we constructed support vector machine and rotation forest models in compressed feature vector domain and verified that these models could effectively avoid overfitting phenomenon. Finally, we discussed the impact of imbalance dataset and different negative dataset construction strategy.Propose a novel approach to perform protein active sites prediction through bi-profile sampling and jack knife test. Firstly we extract sequence conservation features and further preprocess these features in order to avoid overfitting phenomenon. Then we exploit bi-profile sampling method to perform re-coding for amino acid composition, protein secondary structure, protein disorder information as well as solvent accessibility of amino acids. Finally, we compared models performance in predicting protein active sites based on different features combination, different algorithm, and different ensemble strategy. Meanwhile, we also studied the robustness of these models in the case of unbalance dataset.

Keywords/Search Tags:

Protein-protein interactions, protein active sites, support vector machine, rotation forest, ensemble learning, feature selection and extraction, compressive sampling, Bi-profile sampling

PDF Full Text Request

Related items

1	Computational Prediction Of Protein-protein Interactions And Hot Spot Residues In Protein Interfaces
2	The Prediction Of Protein Interactions Based On Integrated Learning Model
3	Prediction Research Of Protein-Protein Interaction Based On Ensemble Of Support Vector Machine And Random Forest
4	Predicting Protein-protein Interactions From Protein Sequence Based On Multiple Feature Extractions
5	Detection Of Protein-protein Interaction Based On Rotation Forest
6	Characteristic Analysis And Prediction Of Protein-protein Interactions And Protein Interaction Sites
7	The Prediction Of Metal Ion-binding Site For Membrane Protein Based On Ensemble Learning
8	Prediction Of Protein - ATP Binding Sites Based On Support Vector Regression Integration
9	Research On Predicting Protein-protein Interactions Based On Machine Learning
10	Research On Prediction Of Protein-protein Interactions Based On Deep Neural Network And Ensemble Learning