Font Size: a A A

Feature Fusion And Design Of Prediction System For Thermophilic Proteins

Posted on:2022-08-14Degree:MasterType:Thesis
Country:ChinaCandidate:F LuFull Text:PDF
GTID:2480306491452514Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
As the main undertaker of human life activities,protein plays an important role in life science research.The study of thermophilic proteins can help us better understand the mechanism of diseases and provide important clues for the functional study of thermophilic proteins and the design of related drugs and catalysts.Although conventional biological experimental methods can accurately identify protein categories,they are costly and time-consuming and cannot meet the requirements of large-scale protein recognition.Therefore,it is particularly important to develop reliable computational methods to predict protein categories quickly and accurately.Since most protein prediction models only extract a single feature,which leads to poor prediction performance,this paper mainly integrates protein features based on two feature fusion methods,namely feature level fusion and decision level fusion,and then constructs the prediction model of thermophilic protein by using machine learning classification algorithm.The research content of this paper is a part of the National Natural Science Foundation project “Key Technology Research on Collaborative Relocation of Human Coronavirus Drugs Based on Multi-source Heterogeneous Data”,and the main research contents include the following aspects:(1)A prediction model of thermophilic protein based on feature-level fusion and variable importance measures was established.Three methods,multi-stage amino acid composition,g-gap dipeptide composition and composition transition distribution composition,were used to characterize thermophilic proteins,and the features extracted by three different methods were connected in series to form a new 499 dimensional vector feature.Although higher dimensional features can contain more protein information,it also leads to information redundancy and the increase of computational complexity.Therefore,the VIM algorithm is designed for feature screening,and the optimal feature subset of 179 dimensions is obtained.Through comparative experiments,the multi-layer perceptron prediction model is established.The accuracy of the model established under the independent test set reached 93.19%,and the prediction performance was significantly improved compared with the traditional single feature extraction method.The overall performance of the model was better than the average method of the existing methods,which has certain theoretical and practical significance.(2)A prediction model of thermophilic protein based on decision level feature fusion was constructed.Five feature extraction methods,including amino acid composition,g-gap,encoding based on grouped weight,entropy density and the correlation coefficient were used to characterize thermophilic protein sequences,and then based on the gaussian kernel function of SVM to build five independent respectively and feature representation methods one-to-one base classifier,the five kinds of methods of prediction as the second input,and use the integration results of logistic regression model to build the thermophilic protein prediction model based on Stacking method and indirectly,the fusion of a variety of protein sequences.The experimental results show that the accuracy of the proposed method is up to 93.75% under the crossvalidation of the left one method,and a number of performance evaluation indexes are higher than other models.The overall performance of the proposed method is better than most of the reported methods,and can significantly improve the prediction performance of thermophilic proteins.(3)A thermophilic protein prediction system was designed and implemented.A prediction system for thermophilic proteins was developed by using Python language and Py QT5 tools,integrating the research results of this paper.The system modules are divided into feature extraction module,feature selection module,machine learning classification algorithm module and related literature module according to the construction order of the prediction model.In addition,in the prediction module,the protein data uploaded by users in FASTA format can be used to quickly predict whether the protein is thermophilic.
Keywords/Search Tags:Thermophilic proteins, Multi-layer perceptron, Feature fusion, Variable importance measures, Stacking, PyQt5
PDF Full Text Request
Related items