Font Size: a A A

Oxidoreductase Classification Prediction Based On HOG-Cos-PSSM Feature Description And TKSE Ensemble Method

Posted on:2020-01-08Degree:MasterType:Thesis
Country:ChinaCandidate:X W YangFull Text:PDF
GTID:2370330575989337Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In recent years,with the increasing recognition of oxidoreductases in the biomedical field,it has been paid more and more attention in drug research and disease diagnosis.Oxidoreductases have many subclasses with different functions,so it is an important task in bioinformatics to classify them accurately.The traditional biological methods have the problems of long time and high cost.However,if we can establish an effective machine learning method according to its sequence,it will be very helpful for the research of its subfamily classification.With the development of machine learning and biological information,the research on the recognition of protein sequences based on computer algorithms has made great progress.As we all know,the essence of enzyme is protein.Therefore,in order to build an efficient and accurate prediction method,this paper considered to describe the feature of enzyme protein sequence,and combines the machine learning algorithm to classify and predict it.In order to predict efficiently and accurately,the study in this paper is divided into two aspects:the improvement of the method for describing the feature of enzyme protein sequences and the construction of prediction classifier.The HOG-Cos-PSSM feature description method and the TKSE ensemble classification framework are proposed for the above aspects.Firstly,existing feature description methods are summarized in this paper.Aiming at the problem of loss of feature matrix location information of enzyme protein sequences,the concepts of Histogram of Oriented Gradient(HOG)and Cosine similarity are introduced,and HOG-PSSM(Histogram of Ornented Gradient PSSM)and Cos-PSSM(Cosine similarity PSSM)are proposed respectively.HOG-Cos-PSSM feature description method is proposed by feature fusion after correlation analysis.Experiments show that HOG-Cos-PSSM feature description method effectively improves the existing feature description method.Secondly,in terms of classifier research,we use multi-classifier ensemble method.This method introduce the concepts of "ability area"and "area selection and ensemble"and proposes the TKSE ensemble classification framework.First,the t-distributed stochastic neighbor embedding(T-sne)algorithm and K-means clustering algorithm is used to divide the"ability area" of the sample space.The base classifiers in the region are then filtered and the Stacking framework is used to form an ensemble classifier for each region.Finally,the test sample selects the ensemble classifier corresponding to the highest similarity region for classification.A large number of experimental results confirmed that the prediction effect of the TKSE ensemble classification framework is significantly higher than that of each base classifier.And it is used together with the HOG-Cos-PSSM feature method to improve the oxidoreductase classification prediction accuracy to 95.87%.The HOG-Cos-PSSM feature description method and the TKSE ensemble classification framework proposed in this paper effectively improve the classification prediction accuracy of oxidoreductase.And this method is an effective complement to the existing prediction methods.
Keywords/Search Tags:oxidoreductases, HOG-Cos-PSSM, TKSE, ability area, T-sne
PDF Full Text Request
Related items