| Quantitative Structure-Activity Relationship(QSAR)has become an indispensable tool for drug researchers.Based on the structures and activities data of known compounds,a regression prediction model was constructed and able to predict the activity of unknown compounds and guide the design of drugs.In this dissertation,Python language was used to design the QSAR modeling program.The purposes of designing the program is to simplify the user’s operation as much as possible.It can not only facilitate the user to complete the modeling task quickly,but also avoid the erroneous result by the misoperation.The QSAR modeling program has the characteristics of easy to use,automatic and efficient,avoiding misoperation and visualization of user results reports and related charts.The resulting QSAR modeling program can improve the efficiency of drug researchers,ensure the accuracy of results and provide a solid foundation for drug development.Based on the LQTA-QSAR,the 4D-QSAR program QSAR-KING was designed to solve the problem in the conformations alignment in 3D-QSAR.The conformation set files were generated by the dynamics simulations of the compound molecules.By the screening the descriptors and constructing the modles,the effects of different conformations on the model were eliminated.The procedure of programming QSAR-KING is as follows: the program running environment design,program input interface,GROMAC topology file generation,molecular dynamics simulation to generate CEP files,molecular stacking and descriptor generation;Th e data processing and modeling is as follows: data reading,truncation and variance filtering,training set and test set partitioning,data processing pipeline,grid searching to determine hyperparameters,descriptor visualizating and final modeling.The QSAR-KING program is designed to simplify user operations with the convenient for users to quickly completing modeling tasks and avoiding erroneous results by misoperation.It can be fully automated on files prepared by the users,provide the results and related charts at the run and give the visualized descriptors in the three-dimensional space.The design of the MIA-QSAR program QSAR-QUEEN is to construct a model by the use of two-dimensional images of the compound.By reading the pixel values in each compound structure picture,a 2D-QSAR model is established with high signal-to-noise ratio.The procedure of programming QSAR-QUEEN is as follows:program running environment design,program input interface and structure image alignment.The compound structure images were automatically aligned by our designing in using image matrix minimum mean error algorithm;The data processing and modeling is as follows: data reading,training set and test set partitioning,descriptor visualizating and grid searching and model building.The QSAR-QUEEN program is designed to be dexterous and easier to use.It can build models quickly and automatically on any Python-enabled operating system(Windows,Linux,Mac OS),and provide reports and charts after the run.The group correlation map can guide drug researchers to design and optimize the structure of compounds.In the experiment of the data set,it is found that the QSAR-QUEEN matrix and the QSAR-KING matrix can be directly combined by column and to form a new large matrix which contains all the information of the two original matrices of the 4D and MIA descriptors.According to the data processing steps of QSAR-KING,the data matrix is operated by QSAR-ROYALTY program and to the establish the regression model.The QSAR-ROYALTY program is equivalent to extending the descriptors of the sample and constructes a new modle that goes beyond the performance of two separate models.The performances of the three QSAR programs were tested by the pharmacological activitic datasets of Btk inhibitor,ACh E inhibitor and GPb inhibitor.The QSAR-KING model performs better on the three datasets than those of QSAR-QUEEN.This is because the datasets can better reflect the differences between compounds by using three-dimensional electrostatic fields and stereo field descriptors.However,the advantage of QSAR-QUEEN is that it produces almost no additional noise signals,especially for some datasets which can contain information about the differences between all compounds using two-dimensional structure.The QSAR-ROYALTY model is significantly better than that of the QSAR-KING or QSAR-QUEEN model for it filters out valuable parts from all the information of the two models.The results also verifies the purpose of the three programs design at the beginning.The QSAR-QUEEN,QSAR-KING and QSAR-ROYALTY programs designed in this paper can be downloaded for free at https://github.com/masgils. |