Font Size: a A A

The Research Of Protein Classificationstions Based On Ensemble Learning And Multi-label Learning

Posted on:2015-02-02Degree:MasterType:Thesis
Country:ChinaCandidate:W C ChenFull Text:PDF
GTID:2250330428461176Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the growth of overwhelming amount of biological data, using traditional biological experiments alone to determine protein structures and other properties not only requires a lot of manpower and resources, but also costs a lot of time. How to build "in silicon" methods for predicting proteins thus reduce the costs of biological experiments can be a meaningful topic. On the other hand, the rapid development of machine learning technology allows its application fields being constantly widening, especially the field of biology--machine learning can be fully applied when facing those numerous biological information.The main contents of this paper include the following aspects:(1) Introducing two methods for proteins feature extraction. In this paper, we apply two effective ways to extract features for the problem of protein classification: the first is a mixed feature combining amino acid compositions and physicochemical properties, totaling188dimensions; while the second is based on position-specific scoring matrices (PSSM), which illustrates protein homologous information, totaling20dimensions. The two feature extraction methods have their own pros and cons: the former one extracts faster at the expense of lower accuracy, while the latter costs longer time to get higher prediction accuracy.(2) Proposing a method for protein fold classification based on ensemble classifiers. Protein structure prediction is an important topic in bioinformatics, and protein fold identification plays a key role in predicting protein structures. In this paper, aiming to alter the fact that accuracy of previous models for protein fold classification is quite low, we introduce ensemble classifiers. Our ensemble classifier is based on voting mechanism and its final result acquires a best accuracy on the common data set through integrating outcomes obtaining by the two basic classifiers.(3) Proposing a2-layer enzyme classification model based on multi-label learning. The first layer gives answer to whether the protein is an enzyme or not, while the second further predicts functions of the enzyme. Multifunction enzyme has become a very tricky heterogeneous because of its special properties when facing the problem of enzyme classification. In this paper, we apply the multi-label classification skill which belongs to machine learning, solving the multifunction enzyme classification problem which previous researchers have not got involved, and achieving good classification results.(4) Developing an online prediction platform for predicting protein fold called PPL, and another for predicting enzymes called IME. PPL and IME both provide programs for local experiments, in addition to their basic function of online prediction. Data sets are also included for downloading so that users can get easily access to our data and do further research.
Keywords/Search Tags:protein classification, enzyme classification, multi-label learning
PDF Full Text Request
Related items