Font Size: a A A

Predicting Hormone Binding Protsins Based On Sequence Information

Posted on:2021-04-08Degree:MasterType:Thesis
Country:ChinaCandidate:J X TanFull Text:PDF
GTID:2370330623967943Subject:Biophysics
Abstract/Summary:PDF Full Text Request
Hormone binding protein(HBP)can interact with hormone protein(HP)selectively and non-covalently,thereby acting as a regulator or inhibitor of HP.Accurate identification of HBP provides an important prerequisite for the correct understanding of cell growth,development,and functional mechanisms.Traditional methods for identifying HBP usually have to utilize complex biological experiments,which is timeconsuming,labor-intensive and slow.In recent years,in order to overcome this drawback,researchers have begun to use machine learning methods to identify HBP,so that this shortcoming can be improved.However,different machine learning algorithms have different prediction effects.Most of the algorithm prediction results are not satisfactory,and the classification ability needs to be improved.Therefore,in this paper,we investigated the performance of a variety of algorithms using cross-validation with general evaluation indicators.Subseqeuntly,we selected the model with the best prediction effect as the final HBP prediction model.This thesis downloads a set of HBP raw data from the Uniprot database and constructs a strict and objective benchmark datasets through strict screening.We used a variety of feature extraction methods to generate features and examined their performance by using support vector machine(SVM)with 5-fold cross-validation for the classification between HBP and non-HBP(non-HBP).The results were obtained and shown as follows.(1)By using CTD method to extract the information of amino acid position,composition and distribution in HBP sequence,we could get an overall accuracy of 60.16%.(2)The natural vector method(NV)-based method was used to extract the number of amino acids,average position and the order normalized central moment in HBP sequence.The overall accuracy reached to 70.33%.(3)Based on the the g-gap dipeptide component(g-gap)method,we could achieve an overall accuracy of 72.76%.(4)The PseAAC method which integrates the physicochemical properties of amino acids to extract information such as amino acid composition,long-and short-range correlation in the HBP sequence could produce an overall accuracy of 76.83%.(5)The tripeptide component(TPC)method was used to extract the tripeptide composition information in the HBP sequence,and the overall accuracy was 72.36%.The model constructed above has limited predictive ability with the low classification accuracy.There is still some room for improvement by feature optimization.Therefore,in order to remove the redundant or noise information generated by highdimensional combined features(g-gap features,PseAAC features,and TPC features),we introduced the analysis of variance(ANOVA)and binomial distribution methods(BD)to rank the features and applyied Incerement Feature Selection(IFS)constructs feature subsets for obtaining the best features.The SVM classification algorithm was also used to classify HBP and non-HBP.The performance evaluation on the models were still based on 5-fold cross-validation.The following results were obtained:(1)By using the ANOVA method for g-gap feature screening,the highest overall accuracy of 80.89% was obtained.(2)By use of ANOVA method to screen PseAAC features,we could obtian the highest overall accuracy of 84.15%.(3)The best overall accuracy of 97.15% was achieved by using the optimal TPC feature selelcted by BD method.Compared with the existing models,the model proposed in this paper has the best prediction effect and the best robustness.In order to facilitate the scientific researchers to use the HBP prediction model,we have built a user-friendly online server HBPred2.0(http://lingroup.cn/server/HBPred2.0).The HBPred2.0 will provide useful guide for HBP prediction.In the future work,as protein sequence data floods into the database at a blowout speed,the powerful learning capabilities demonstrated by deep learning in large data volumes,such as Inception neural network,ResNet neural network and convolutional neural network are worth investigating.The use of these methods will inevitably provide more powerful help for the study of HBP.
Keywords/Search Tags:hormone binding protein, feature extraction method, feature selection method, support vector machine, websever
PDF Full Text Request
Related items