Font Size: a A A

Sequence Based Prediction Of Transmembrane Protein Crystallization Propensity

Posted on:2021-05-22Degree:MasterType:Thesis
Country:ChinaCandidate:Q Z ZhuFull Text:PDF
GTID:2370330620965710Subject:Biology
Abstract/Summary:PDF Full Text Request
Transmembrane protein plays an important role in cell life activities.It can be used as a channel to connect the inside and outside of cells for material transport,as a receptor for signal recognition,and as a drug target to participate in drug reactions.Exploring the function of transmembrane proteins is helpful to deepen the understanding of biological activities.There is an inseparable relationship between the structure and function of proteins.Therefore,analyzing the three-dimensional structure of proteins can help to further understand the role of transmembrane proteins in cell life activities to a large extent.At present,the main method for analyzing protein structure is single crystal X-ray diffraction.With X-ray diffraction,first of all,the protein to be studied should be crystallized,but protein crystallization is a very expensive and time-consuming process,involving a series of complicated attempts to obtain high-quality crystals.Previous studies have shown that not all proteins can form crystals,and for transmembrane proteins with special structure,it is more difficult to crystallize.In order to reduce the energy consumption of the experimental researchers and improve the efficiency of the experiment,a series of tools have been developed to predict the propensity of protein crystallization,but the number of tools specifically used to predict the propensity of transmembrane protein crystallization is still very limited,and the performance needs to be further improved.In order to solve this problem,in this paper,we propose a new prediction method for predicting the propensity of transmembrane protein crystallization,and further improve the performance of the tool through the optimization of features.We first refer to the features used in previous predictions of protein crystallization propensity to quantify the biological features of transmembrane proteins,including amino acid composition,amino acid physicochemical properties and pseudo amino acid composition.By using feature combination and feature extraction methods,a relatively optimal feature subset is obtained.Then compared the prediction results of five commonly used machine learning models of Random Forest,Support Vector Machine,K-Nearest Neighbor,Logistic Regression,and XGBoost,finally we decide to use XGBoost combined with EasyEnsemble method to build a new prediction tool for transmembrane protein crystallization(PTMC I).The results show that the frequency of amino acids appearing in the sequence and the global features of the inherent physical and chemical properties of proteins play an important role in the prediction of transmembrane protein crystallization.Compared with other non-specific protein crystallization propensity prediction tools,we found that our PTMC I has higher performance in predicting transmembrane protein crystallization propensity,and the AUC of the test set reached 0.865.Transmembrane proteins are often difficult to dissolve because of their special physical and chemical properties.Based on the features of transmembrane proteins in crystallization experiments,combined with the relevant literature reports,we quantified the distribution of relative solvent accessible surface area,and the product of relative solvent accessible surface area distribution and hydrophobicity.Combined with general biological features,60 dimensional features are determined through feature selection for the construction of the final model(PTMC II).We train PTMC II by combining the selected features with XGBoost model and the AUC on the test set reached 0.952,which is much higher than the PTMC I.It also shows that the combination of sequence information and structure information can predict the crystallization propensity of transmembrane protein more effectively.In recent years,with the increasing attention of scientific researchers in the field of protein structure,the data about crystallization experiments are also increasing.The model for predicting the propensity of transmembrane protein crystallization that we build that will greatly help the study of the structure and function of transmembrane proteins.
Keywords/Search Tags:Machine learning, Bioinformatics, Transmembrane protein, Prediction of crystallization propensity
PDF Full Text Request
Related items