| G protein coupled receptor(GPCR)is one of the most important drug targets,accounting for approximately 34%of drug targets on the market.For drug research findings,accurate modeling and interpretation of ligand biological activity is essential for screening and optimizing ligand biological activity.Traditional machine learning methods are not effective,and new methods are needed to solve this problem by accurately assessing the biological activity of the ligand and finding the key substructure features that can determine the biological activity of the ligand.Previous studies have shown that homologous G protein-coupled receptors facilitate the modeling and interpretation of the biological activity of ligand molecules.Therefore,this paper propose a new method,GLEM,to predict the biological activity of ligands and identify key substructures related to GPCRs through group sparseness,and using multi-source deep migration learning to predict the biological activity of ligands.In this way,multi-source deep transfer learning is used to screen the extended connection fingerprint(ECFP).GLEM mainly includes five consecutive steps:(i)characterize the ECFP of the ligand molecule,(ii)to use Group Lasso to select the characteristics of the ligand molecule in the target domain and the source domain,and to obtain the key substructures that determine the biological activity of the ligand;(iii)to train the deep multi-task learning model of source domain data;(IV)to obtain the biological activity learning model of ligand molecules in the target domain based on deep multi-task transfer learning;and(V)to predict the biological activities of ligand molecules from target domains based on deep multi-task transfer learning.The GLEM method was tested on 30 representative GPCRs in 9 groups covering most of the subfamilies of human GPCRs,and each subfamily had 60-3000 ligand associations.Compared with the single-task learning method,the correlation coefficient(~2)of the GLEM method is increased by 31.72%on average;compared with the deep multi-task learning method,the correlation coefficient(~2)is increased by 22.45%on average;compared with the group sparse deep multi-task learning The method correlation coefficient(~2)increased by 7.8%on average;compared with the single-source transfer learning method correlation coefficient(~2)increased by 3.6%on average,GLEM has achieved excellent performance in modeling the biological activity of ligands.The results show that the GLEM method has the best performance in most data sets compared with the methods selected in this paper.In addition,we have also considered the influence of different numbers of training samples,and found that the GLEM method in this paper performs best in the case of small samples. |