Font Size: a A A

A Study On Feature Extraction And Classification Algorithms For Protein Structural Class Prediction

Posted on:2013-02-17Degree:MasterType:Thesis
Country:ChinaCandidate:L WuFull Text:PDF
GTID:2230330371461818Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
With the development of human genome sequencing technology, a great gap emerges betweensharply increasing known protein sequences and slow accumulation of senior protein structures andfunctions. Because there are less protein structures obtained through the experiment, it ismeaningful to find a reliable and effective computational approach to predict protein structures andfunctions, which is one of the most fundamental and important task in computational molecularbiology.However, it is difficult to predict the three-dimensional structures of proteins based on aminoacid sequences of proteins primary structures. Such a need calls for“inter-research”betweenprotein sequences and structures. In bioinformatics, researchers discovered that protein structuralclass is very important for protein structure prediction, since the three-dimensional structures ofproteins are formed by some secondary structural elements. Thus, the predicted protein secondarystructural class is considered the“inter-research”which is helpful and significant to furtherunderstand the three-dimensional structures and its function. As a branch of proteomics research,protein structural class prediction becomes a hotspot which attracts more and more attentions fromresearchers recently.Protein structural class prediction often focuses on three aspects: features extraction, featuresselection from feature sets and classification algorithm used for prediction. Various significantefforts have been made in this domain. But from the perspective of information acquiring, most ofthe information got by existed methods is single. All of the information extraction from differentlevels should be combined to predict protein structural class. Meanwhile, some information incombined features is redundant which should be considered to remove. According to this motivation,we extraction some features based on the information processing method, including the compositionand position information of amino acid sequence and the features of predicted secondary structuralsequence. Then all the features extracted from different sequence are combined into a feature set,and a few features are selected as the input of classifier. Finally the protein structural classes can bepredicted by effective and efficient classification algorithm.Given an amino acid sequence, we first transform it into a reduced amino acid sequence. Thenwe calculate the word frequencies and word position features of protein primary sequence, reducedsequence and predicted secondary structural sequence. A feature set is built after combining allthese information. Based on this feature set, we remove the redundant information by Random Forest method. Using Support Vector Machine, Neural Network, k-Nearest Neighbor and MultipleClassifiers Combination, we predict the protein structural class and verify the validity of ourmethods which proposed in this research.The results demonstrate that: 1) Comparing with existing methods, the proposed novel methodis efficient, which highlight the necessity for prediction method to extract more useful information.This understanding can be used to guide development of more powerful measures for prediction ofprotein structural classes; 2) With the effective feature selection method, more redundant featuresare removed, which enhanced the classification accuracies.Seeking from the bioinformatics perspective, this study use information extraction, informationcombination to predict protein structural class, which is helpful to the research of protein structureand function, the domain of machine learning and the area of protein sequence analysis.
Keywords/Search Tags:protein secondary structural class, feature extraction, random forest, feature selection, machine learning, multiple classifiers combination
PDF Full Text Request
Related items