Font Size: a A A

Enhancing protein fold prediction accuracy using new physicochemical-based features and fusion of heterogeneous classifiers

Posted on:2012-07-19Degree:M.ScType:Thesis
University:Multimedia University (Malaysia)Candidate:Dehzangi, AbdollahFull Text:PDF
GTID:2450390008992807Subject:Biogeochemistry
Abstract/Summary:
One of the most challenging research areas in the bioinformatics is to predict the tertiary structure of a protein from its amino acid sequence. Difficulties of this task, such as lack of knowledge about the protein structural stability or how the amino acids interact with each other along the amino acid sequence of a protein have made this an open research issue for the bioinformatics and the molecular biology.;Recently, due to tremendous advancement in Pattern Recognition, Machine Learning, and Artificial Intelligent (AI) fields, there has been a great interest to apply intelligent approaches to tackle the protein fold prediction problem. To enhance the protein fold prediction accuracy using the pattern recognition-based approaches, the prediction performance of the applied classifier, discriminatory information of the extracted features, and compatibility of the applied classifier and extracted features should be considered. In this research we aim at solving the protein fold prediction problem using the pattern recognition-based approaches such as using fusion methods and extracting new physicochemical-based features.;In this study, in order to explore the prediction performance of different classifiers for the protein fold prediction task, a comparison study of seven classifiers namely: Multi Layer Perceptron (MLP), Support Vector Machine (SVM), K-Nearest Neighbor, C4.5, Naive Bayes, AdaBoost.M1, and LogitBoost have been conducted. The applied classifiers have been chosen based on their popularity and their results achieved in previous works.;Based on the finding from our comparison study, new fusion of heterogeneous classifiers (AdaBoost.M1, LogitBoost, Naive Bayes, MLP and SVM) has been proposed to tackle this problem. The proposed method aims at enhancing the protein fold prediction accuracy by employing the discriminatory ability of different classifiers (diversity among classifier ensemble) to enhance the general performance of the new classifier instead of using strength of an individual classifier. To the best of our knowledge, the proposed method enhances the protein fold prediction accuracy as compared to the other studies found in the literature.;In continuation, two Meta classifiers namely: Rotation Forest and Random Forest classifiers have also been employed to tackle the protein fold prediction problem. Our experimental results showed that the applied methods outperformed most of the works found in the literature as well as reducing time consumption of this task.;To explore the discriminatory power of features, new feature groups have been extracted based on the physical and physicochemical properties of the amino acids. The effectiveness of the extracted feature groups have been studied using three most popular classifiers that consistently perform better than other employed classifiers (MLP, SVM, and AdaBoost.M1). The achieved results show that the extracted features are more effective than other features that have been proposed by previous works considering the number of features.;Finally, our proposed method has been applied to different combinations of our extracted features to investigate the compatibility of the proposed classifier and extracted features. Our experimental results show that using the proposed method with the combination of the new features enhance the protein fold prediction accuracy better than using each of them individually. The proposed approaches also showed lower time consumption considering their prediction performance compared to the other methods have been used to tackle the protein fold prediction problem.;In this study, a new fusion of heterogeneous classifiers and new physicochemical-based features have been proposed to tackle the protein fold prediction problem. The proposed approaches enhance the prediction performance of this task for two most popular benchmarks that have been widely used in previous works.
Keywords/Search Tags:Protein fold prediction, Features, Classifiers, Using, Previous works, Fusion, Proposed, Heterogeneous
Related items