Font Size: a A A

Computational Researches On Sequence-Based Transmembrane Protein-Ligand Binding

Posted on:2021-01-02Degree:DoctorType:Dissertation
Country:ChinaCandidate:C LuFull Text:PDF
GTID:1360330620978556Subject:Genetics
Abstract/Summary:PDF Full Text Request
Bioinformatics is an interdisciplinary subject of modeling,analysis and simulation for biological problems through mathematics and statistics.With the breakthrough of biological research methods,the accumulation of biological data,and the rapid development of computer technology,the era of big biological data has come.Bioinformatics research is becoming more and more mature and has become an indispensable and essential part of the field of biology.Bioinformatics mainly includes genomics and proteomics in the microscopic field.The research content of this paper belongs to proteomics,which is to predict and analyze the structure and function of membrane proteins and transmembrane proteins by using the machine learning method.Membrane proteins are a class of proteins with unique structure and function,which are closely related to biofilm,or permanently attached to biofilm,or briefly interact with biofilm.Membrane proteins are involved in many important cellular activities,such as material transfer,signal transduction,immune response,and energy metabolism.Transmembrane protein is the most typical and abundant type of membrane proteins,which penetrate through the biofilm and are permanently embedded in the biofilm.Their abnormality will directly lead to the emergence of diseases.At the same time,transmembrane proteins are the key research targets in the field of medicine and pharmacy.Because of the essential biological significance of membrane proteins and transmembrane proteins,researchers have been working tirelessly to study them and achieved fruitful results.The study of membrane proteins and transmembrane proteins by bioinformatics can assist in the establishment of protein interaction networks,mapping of metabolic pathways,and drug screening,etc.The study on the structure and function of membrane proteins and transmembrane proteins has become an important research direction in the field of bioinformatics.In this paper,a series of researches on membrane proteins and transmembrane proteins are carried out.The main objective is to extract features from the first-order sequence of proteins and predict and analyze membrane protein-ligand interactions through machine learning.First,to compensate for the lack of first-order sequence information,two based on deep learning predictors of transmembrane protein structure descriptors were constructed,namely,TMP-SSurface and TM-ZC.The accessibility of residue surface and z coordinate are closely related to the structure information of function,which can provide help for the followup research on membrane protein function.Next,the predicted residue surface accessibility and z coordinates were taken as characteristics to construct a membrane protein-ligand binding site predictor(MPLs-Pred)based on random forest,and the ligand-specific prediction model was trained according to different ligand types to improve the prediction performance further.In the studying process of membrane protein-ligand,we noticed a typical ligand with membrane protein as the target protein: ubiquinone,and constructed a ubiquinone binding protein predictor(UBPs-Pred)based on XGBoost.Beside carried out bioinformatics analysis for ubiquinone binding protein.According to the above research ideas,the research work of this paper is as follows:1)A transmembrane protein residue surface accessibility predictor(TMP-SSurface),was proposed based on deep learning in this paper.The surface accessibility of residues was used to describe the exposure of residues to the external environment and was measured by the accessible surface area of relative solvents.Tmp-SSurface applies to the complete sequence residues of all types of transmembrane proteins,and there is no restriction on the types of transmembrane proteins and the topological structure of the residues,that is,there is no prior knowledge constraint.Tmp-SSurface uses evolutionary conservatism,binary coding,and sequence terminal identifier as input features.The classification model is a sophisticated deep learning network with Inception and CapsuleNet integration.The experiment proves that Tmp-SSurface is a stable and efficient model with good generalization ability.Different types of transmembrane proteins can be predicted excellent.At the same time,the predictors are less dependent on features,and the deep learning network can explore the intrinsic relationship between the sequence and structure of transmembrane proteins.2)The deep learning-based transmembrane protein residues z-coordinate predictor TMZC was proposed in this paper.The transmembrane protein residues z-coordinate describes the vertical distance from the residues to the central plane of the biofilm and is a structural descriptor to quantitatively measure the relative position relationship between the residues and the biofilm.Similar to Tmp-SSurface,TM-ZC is also applicable to the complete sequence residues of all types of transmembrane proteins,and users do not need prior knowledge.TMZC uses the same characteristics as Tmp-SSurface: evolutionary conservatism,binary coding,and sequence terminal identifiers.The classification model is a convolutional neural network with seven convolutional layers.Experimental results show that the TM-ZC model is stable,has good generalization ability,and has excellent prediction performance for all kinds of transmembrane proteins.3)Based on the previous work,a membrane protein-ligand binding site predictor(MPLs-Pred)based on random forest,was proposed in this paper.Many essential functions of proteins depend on the interaction with ligands,and ligand-binding site prediction is one of the crucial tasks of protein functional annotation.Mpls-pred used four features to characterize the membrane protein residues,namely,evolutionary conservation,physicochemical properties,surface accessibility,and z coordinates,which were predicted by Tmp-SSurface and TM-ZC respectively.The classifier of Mpls-pred is a random forest,and the multi-fold random undersampling strategy is used to solve the serious problem of sample imbalance.In addition,considering the vast differences between different ligands,ligands were divided into three types,namely drug-like compounds,metal and biological macromolecules,and ligand-specific prediction models were trained to improve the performance of the predictors further.Besides,we also carried out gene bulk enrichment analysis and KEGG pathway enrichment analysis on human drug-like compound target membrane proteins.4)In the process of sorting and analyzing the data of membrane protein-ligand interaction,ubiquinone has attracted the attention of the author.86.9% of its target proteins are membrane proteins,among which 68.5% are transmembrane proteins,which is a typical ligand with membrane proteins as the target.In this paper,a recognition model of ubiquinone binding protein UBPs-Pred was proposed,bioinformatics analysis of ubiquinone binding protein was carried out.UBPs-Pred used three characteristics of the amino acid composition,dipeptide composition,and evolutionary conservation to carry out feature coding for proteins,used the random forest to rank feature importance,and conducted feature selection through incremental feature selection strategy.UBPs-Pred uses XGBoost as a classifier.Considering that XGBoost involves many parameters,and its performance is sensitive to parameters,this experiment uses a multi-objective particle swarm optimization algorithm to optimize the parameters of XGBoost.The research shows that UBPs-Pred has excellent performance.In order to further understand the ubiquinone binding proteins,this experiment carried out bioinformatics analysis for the ubiquinone binding proteins,including the statistics of the modules in the ubiquinone binding domain;The classification and statistics of the superfamily of ubiquinone binding proteins were carried out.The enrichment analysis of human ubiquinone binding protein was carried out on the gene ontology and KEGG pathway.
Keywords/Search Tags:Membrane Protein, Transmembrane Protein, Surface Accessibility, Z Coordinate, Ligand-Protein Binding Site, Ubiquinone-Binding Protein, Machine Learning
PDF Full Text Request
Related items