Font Size: a A A

Research And Application Of Lifelong Machine Learning On Protein Clsssification Problem

Posted on:2021-03-23Degree:MasterType:Thesis
Country:ChinaCandidate:L W YangFull Text:PDF
GTID:2370330623467773Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Bioinformatics permeates all aspects of life science.How to use computer to clas-sify proteins quickly has always been a hot issue in the research area of Computer Science and Bioinformatics.How to carry out effective incremental learning for protein multi-classification problem has always been a research vacancy.At the same time,incremental learning is also one of the difficulties of lifelong machine learning algorithm.Based on the idea of lifelong machine learning,this thesis constructs two effective incremental learning multi-classification models for protein family data,expands the application field of lifelong machine learning,and provides a new idea for the research of multi-classification problem in bioinformatics.The main research contents and contributions of this thesis are as follows:1.In terms of data,this thesis sorts out the protein family data in Pfam database,and provides a protein family data set with strict labeling and redundancy removal for the study of lifelong machine learning model.Because of the large number of tasks and the small number of data samples of each individual task,this data set is very suitable for the research of lifelong machine learning model and task association discovery.In this thesis,a variety of methods are used to construct amino acid sequence features from the perspective of computer science and biology,which have achieved good classification results in the proposed two models.2.From the perspective of algorithm,this thesis designs two lifelong machine learning methods according to whether need retaining the training data of historical tasks.In the first method,the SVM multi-classification model is split,so that it can choose more suitable feature subspace in the subtask,and can carry out effective incremental train-ing while retaining the training data of historical tasks.In addition,this thesis takes whether the self-encoder can effectively restore the input eigenvector as the criterion of classification,innovatively introduces the pre-task loss and mean value loss in the hidden layer,and constructs a lifelong machine learning classification method without retaining the historical task training data.3.In view of service,based on the proposed SVM lifelong machine learning method,this thesis constructs a web service for protein family classification task,which cur-rently supports the classification of 26 protein families.In this task,the final accuracy,specificity,sensitivity and MCC metrics of the model reach 0.9934,0.9995,0.9873 and 0.9869 respectively,which shows the effectiveness of the proposed algorithm.
Keywords/Search Tags:lifelong machine learning, protein classification, machine learning, bioinformatics
PDF Full Text Request
Related items