Font Size: a A A

Prediction Of GPCRs And Identifying GPCR-drug Interaction Based On Machine Learning

Posted on:2022-12-23Degree:MasterType:Thesis
Country:ChinaCandidate:Z LvFull Text:PDF
GTID:2504306611957949Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
G-protein coupled receptors(GPCRs)are the largest family of membrane proteins in humans.GPCRs are involved in the regulation of various physiological processes.In addition,they are also one of the most important targets in modern drug development.However,it takes a lot of time to identify GPCRs and their interactions with drugs by using traditional experimental methods.In recent years,due to the rapid development of computers and biology,a large number of amino acid sequences of GPCRs have been determined,collected and published,and the data of drug molecules that can interact with GPCRs have been accumulating.Building predictive models to accurately identify GPCRs and predict interactions between GPCRs and drugs(GPCR-drug)are possible.The specific research work in this paper includes the following points:1.Build an automatic recognition model based on protein sequence to identify whether unknown proteins belong to GPCRs accurately.Firstly,word-embedding Technology and Bag of Words(BOW)model combined with weighted profile coefficients are used to extract the original feature vector based on the protein sequence of GPCRs,and the dimension of the original feature vector is 1011 dimensions.Then,artificial neural network is used to optimize the features further,and then an automatic recognition GPCRs model is constructed based on the machine learning algorithm XGBoost.Finally,the proposed model is tested by different validation methods.Compared with other state-ofthe-art models,the proposed model in this paper has the best performance.2.An automated model is built to predict the presence or absence of interactions in GPCR-drug.For GPCRs features,this work uses BOW model to extract protein sequence features.For drug molecules,discrete wavelet transform(DWT)is employed to extract features from the raw molecular fingerprints.Then,the SMOTE algorithm is selected to balance the training dataset and the artificial neural network is used to extract features further.Later,a gradient boosted decision tree(GBDT)model is trained using the selected features by the neural network.Finally,the proposed model is tested by leave-one-out validation and independent test.According to the results compared with other advanced models,whether based on leave-one-out validation or independent testing,the performance of the proposed model in this paper outperforms other models.3.In order to facilitate researchers to use and test the research results of this paper,the author establishes a prediction platform to provide online services based on the automated prediction models in 1 and 2.Considering these researchers may have different needs,the prediction platform provides the following four services:(1)Identify whether the sequences are GPCRs.(2)Predict whether GPCRs sequences and drug molecules have interactions.(3)Return the drug molecules that can interact with the given sequence from the drug molecule dataset constructed in this paper.(4)For a given sequence,firstly identify whether it is GPCR,and if it is a GPCR,return drug molecules that can interact with the given sequence from the drug molecule dataset constructed in this paper.In this paper,authors study deeply the identification of GPCRs and their interaction prediction with drug molecules,and then build models with better prediction effects based on existing methods,and establish an online prediction platform based on them.
Keywords/Search Tags:GPCRs, GPCR-drug, feature extraction, algorithm, forecasting platform, BOW, Word-Embedding
PDF Full Text Request
Related items