| Compounds are widely used in daily life and will inevitably produce organic pollutants,which will pose a threat to the balance of the ecological environment and human health.Therefore,biodegradability prediction is a research hotspot at present.Although the existing classification and prediction methods include feature selection,because they do not consider its dynamic change process,the deletion of irrelevant and redundant features is not accurate.At the same time,the integrated algorithm has the problem of time-consuming and inefficient super parameter selection.Therefore,the research on the classification and prediction algorithm based on feature selection and super parameter optimization has certain practical research significance and application value.In this paper,an improved LightGBM algorithm based on MD-Medoids feature selection and Sparrow Search Algorithm(SSA)parameter optimization is proposed and applied in the research of biodegradability prediction.The main work is as follows:Firstly,in view of the dynamic change process of feature selection,this paper proposes MD-Medoids feature selection algorithm.First,the features of the data set are divided into two categories,one is selected features and the other is candidate features,by calculating mutual information for the correlation information between the candidate features,the selected features are obtained,we used the dynamic correlation to calculate the correlation information between the selected features and the category label,and then the relative information and the correlation information are used as the input of the clustering algorithm k-medoids to cluster and analyze all the features,so as to effectively remove the redundant features and irrelevant features,Finally,the method is verified on five sets of data sets.The results show that MD-Medoids feature selection algorithm performs well on accuracy,precision,F1 and AUC.Secondly,aiming at the super parameter optimization problem of LightGBM algorithm,SSA-LightGBM classification prediction algorithm is proposed in this paper.The algorithm mainly uses SSA to improve the LightGBM algorithm.For the LightGBM algorithm,eight parameters that have a great impact on the algorithm are selected.These parameters are optimized by SSA to determine their optimal parameter combinations.Finally,experiments are carried out on five groups of data sets and compared with SVM,XGBoost,RF and LightGBM.The SSA-LightGBM classification and prediction algorithm proposed in this paper obtains the best accuracy,F1 and AUC values.Thirdly,this paper combines the MD-Medoids feature selection algorithm and SSA-LightGBM classification prediction algorithm,proposes an improved LightGBM algorithm based on MD-Medoids feature selection and SSA parameter optimization,and applies the algorithm to the biodegradation data set of quantitative structure-activity relationship algorithm(QSAR),The ready and not ready biomolecules are classification predicted.Finally,the experimental results of the algorithm are compared with SVM,RF,XGBoost,LightGBM and SSA-LightGBM.The results show that the algorithm proposed in this paper achieves the best results in accuracy,precision,F1 and AUC. |