Font Size: a A A

Classification Of The Molecular Structure Of New Coronavirus Drugs Based On Artificial Intelligence

Posted on:2022-11-05Degree:MasterType:Thesis
Country:ChinaCandidate:K L ZhaoFull Text:PDF
GTID:2504306782451894Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
Today,the novel coronavirus outbreak has brought a huge and increasing impact,and the mutation of the novel coronavirus has also made vaccine design difficult.The principle of new use of old drugs is of great significance for the treatment of patients with new coronary pneumonia and the design of drug candidates.Through this principle,candidate drugs can be selected from existing anti-COVID-19 drugs by using immunology and drug molecular design methods,and quantitative structure-activity relationship can be used to study the relationship between molecular structure and activity.Currently,existing work on novel coronavirus research focuses on the epidemiological analysis of virus transmission,the protein structure and genetic composition of the virus,and the clinical treatment of drugs.However,there is still relatively little research on the relationship between existing drugs and their molecular structural characteristics,and the molecular structure information between coronavirus pneumonia drugs can provide guidance for the design of new drugs and the development of vaccines.For drug design,hydrogen bond plays an important role.The existence of hydrogen bonds in a drug will affect the topology and properties of the molecule,and the number of hydrogen bond donors will affect the metabolic utilization of the drug,which provides a basis for the rational design of drug molecules.The development of computer-aided drug design and artificial intelligence has also accelerated the research progress of the new pneumonia epidemic.Therefore,based on the information about COVID-19 drugs provided by the Pub Chem database,this thesis extracts the 3D molecular structure feature information of the compounds through SMILES,uses the descriptors of the molecular structure of the compounds to establish an artificial intelligence classification model,and analyzes the relationship between properties of different drug molecules and hydrogen bonds.The use of reasonable and effective molecular descriptors is of great significance to explain the relationship between drugs and molecular structures.Since there are thousands of molecular structure descriptors,it is necessary to select suitable feature attributes from numerous molecular descriptors.In this thesis,the chi-square test method is used to filter and select the features,and the chi-square value is sorted by analyzing the correlation between the feature attributes and the hydrogen bond donor category,and finally the top ten features are selected from many molecular descriptors.Because different compounds have different numbers of heavy atoms,the dimensions of the molecular feature matrix extracted by different compounds are different.Therefore,in order to solve this problem,this thesis innovatively proposes a frequency domain interpolation method based on 2DDCT to solve the problem of inconsistent dimension of feature matrix.Calculate the maximum number of heavy atoms of the compound in the data set,and use the molecular structure feature matrix of the compound as graphic information to perform two-dimensional discrete cosine transform interpolation in the frequency domain according to the number of heavy atoms,so as to realize the unification of the feature dimensions of different compounds.The method proposed in this thesis can be retained More molecular structure feature information can improve the classification performance of the model.Finally,the reliability of the proposed frequency domain interpolation method based on 2DDCT is verified by statistical learning method.The calculated ICC is higher than 0.75,indicating that the method has high performance.reliability.Due to the large dimension of the obtained molecular structure features,three different feature dimensionality reduction methods were introduced and compared,and three different classical classification models were established and compared,and the grid search method was used to carry out hyperparameters.The experimental results show that feature dimensionality unity method based on 2DDCT can effectively solve the problem and the principal component analysis and random forest can effectively improve the classification accuracy of the model.
Keywords/Search Tags:Molecular structure, Novel coronavirus drugs, 2D discrete cosine transform, Hydrogen bond donor, Random forest
PDF Full Text Request
Related items