| Organic reaction is widely used in drug synthesis,agricultural production,material design and other fields.Metal-catalyzed coupling reaction is a very important research direction in organic reaction.However,the route of such reactions is usually complicated,and traditional experiments rely on a large amount of manual trial and error,which is time-consuming and costly,and the toxic by-products produced by the reactions will cause serious environmental pollution.Artificial intelligence technology has a unique advantage in reducing cost and increasing efficiency.The core of its performance lies in the representation of data and the interpretability of results.Based on this,this thesis takes an important metal catalyzed coupling reaction-palladium catalyzed Buchwald Hartwig coupling reaction as an example,builds two intelligent prediction and analysis systems based on a strong interpretable boosting tree model and the convolutional neural network with strong representation ability,so as to perform intelligent prediction and analysis on the yield of organic reaction(yield for short),and pays attention to the interpretability of the intelligent model.It also pays attention to the interpretability of the intelligent model in order to provide more abundant decision information for the experimenters while realizing accurate prediction.The specific work is as follows:(1)A yield intelligent prediction and analysis system is constructed based on the boosting tree model Cat Boost.In order to reduce data redundancy,computational pressure,and consider the close connection between feature and model,feature selection should select features that are useful for machine learning tasks.Therefore,a recursive elimination feature selection algorithm related to the model is used to obtain a comprehensive and concise set of feature subsets.On this basis,a Cat Boost regression is established to intelligently predict the reaction yield,achieving ideal prediction accuracy.Moreover,it is analyzed from the aspects of parameter optimization,convergence performance,prediction accuracy,time complexity and generalization performance.Finally,three different interpretable methods are used to analyze the internal relationship between reaction conditions and yield,providing more valuable decision-making information for the experimental personnel.(2)An intelligent prediction and analysis system based on integrated lifting tree Cat Boost and attention-driven lightweight convolutional neural network is constructed.Based on the characteristics of data,a lightweight convolutional neural network is designed and a lightweight attention module is added to focus on key features without significantly increasing model complexity.Then,the network is used as a feature extractor to apply Cat Boost regression to the extracted abstract features to obtain the predicted yield value.The results show that the hybrid model has higher prediction accuracy,better prediction performance,and conforms to the concept of green chemistry. |