| Rheumatoid Arthritis(RA)is a chronic,generally progressive autoimmune disease characterized by symmetric swelling and pain in multiple joints throughout the body.It is a common chronic inflammatory joint disease and affects most adults.The clinical manifestations are joint inflammation.A pathophysiological study revealed that splenic tyrosine kinase(SYK)plays a very important role in the pathogenesis of rheumatoid arthritis.SYK is the intracellular cytoplasmic receptor of spleen tyrosine kinase.SYK inhibitors can be used to treat rheumatoid arthritis.This study introduces small molecules,have strong inhibitory activity against SYK.Based on the biological activity of these small molecules(p IC50),a quantitative structure-activity relationship(QSAR)study was conducted using Python.A training set,a test set,and a validation set were established.Various machine learning algorithms were used to predict the ability of developed QSAR models,and to screen out some eigenvalue descriptors to predict new SYK inhibitors.The specific research content is as follows:(1)Screening descriptors:1444 descriptors were calculated and screened through a specific selection algorithm,after the three-step screening,the number of descriptors reduce to tens of,with a correlation below 0.6.The variance of self-variance is higher than 0.1.(2)Establish a QSAR model:a random forest and other algorithms were used to predict the biological activity of 238 SYK inhibitors with diverse structures,a feature value selection algorithm was applied to select the best subset from the remaining tens of descriptors.The final QSAR model was established;it showed a good predictive ability and identifies important descriptors that able to predicate the biological activity of SYK inhibitors.At the same time,this paper also uses a genetic algorithm coupled with support vector regression analysis to predict SYK inhibitors.This algorithm can simultaneously select descriptors and optimize support vector machine parameters,which not only greatly increases the computational efficiency of the model,but also improve the predictive power of the model.The model finally gives the training set correlation coefficient R2train=0.94,and the test set prediction correlation coefficient R2test=0.91.The established model can be used to screen,predict and optimize SYK inhibitors before drug synthesis.(3)Through the above methods,we have produced the final result:10 2D descriptors are selected for the feature value.The results obtained are:random forest algorithm training set R2=0.91;support vector machine algorithm training set R2=0.94;KNN algorithm training set R2=0.76;GBRT algorithm training set R2=0.88.Both achieved good results.The models established in this study showed high predictive ability.They can screen out ideal drug candidates in the early stage and can provide strong theoretical support for the synthesis of highly active drug molecules in the future. |