Font Size: a A A

Feature Extraction, Characterization, and Classification of Proteins Using Random Forest

Posted on:2017-12-23Degree:Ph.DType:Dissertation
University:North Carolina Agricultural and Technical State UniversityCandidate:Ismail, Hamid DFull Text:PDF
GTID:1458390008950689Subject:Bioinformatics
Abstract/Summary:
Machine learning algorithms have been widely used in bioinformatics to develop computational tools and the usage is still growing due to the growth of the volume of data and availability of computational resources, and invention of newer machine learning algorithms. The important task in this implementation is to fit models to experimentally pre-classified data and then to use these models to make a prediction about an unclassified instance. Since the advent of whole genome sequencing, protein sequences have been increasingly deposited and classified in databases. The objectives of this dissertation are to develop a computational tool for protein feature extraction and to implement random forest based algorithm to solve various bioinformatics problems. The project is motivated by the gap existing in feature extraction tools and the need for improvement to some current prediction methods. Four different tools are developed; the first one is the Feature Extraction from Protein Sequence tool (FEPS), which is an easy-to-use web-based tool that computes the most common protein features and provides features in different output file formats. The other three tools are RF-NR, RF-Phos, and RF-Hydroxysite. RF-NR predicts the subfamilies of nuclear receptor proteins, which represent a large protein superfamily, while RF-Phos and RF-Hydroxysite predicts the sites of post-translational phosphorylation and hydroxylation respectively in protein sequences. These methods were validated and tested rigorously with both cross validation and independent samples. In comparison with the existing ones, our new bioinformatics tools perform equally well or better compared to the existing tools. These tools are available online at Bioinformatics and Computational Biology Lab's website at bcb.ncat.edu.
Keywords/Search Tags:Tools, Feature extraction, Protein, Bioinformatics, Computational
Related items