Font Size: a A A

Virtual Screening Of Drug Protein Based On Imbalance Data Classification Model

Posted on:2018-08-07Degree:MasterType:Thesis
Country:ChinaCandidate:W ChenFull Text:PDF
GTID:2310330512473317Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the completion of the human genome project,the theoretical research of bioinformatics,biochemistry and other fields continues to deepen,and the methods and technologies of drug discovery are constantly updated.Due to the high efficiency of computer processing information,pattern recognition,machine learning and other methods gradually penetrate into the field of drug discovery.Computer aided drug design(Computer-aided Drug Design,CADD),highthroughput screening,the development and improvement of bio chip technology,provide a lot of new and powerful tools for drug discovery,greatly broaden the way of drug discovery.Virtual screening based on molecular docking is one of the important methods of the computer aided drug design.Because of its good universality,has been recognized by the majority of institutions and pharmaceutical companies.At the same time,however,the accuracy of this strategy relies heavily on the accuracy of the scoring function.From the current point of view,on the one hand,the research on the scoring function is still subject to the limitations of theory and methods,so there is still not an entirely correct approach.On the other hand,in the process of virtual screening,the proportion of non active candidate compounds is larger,and the wrong docking conformation will affect the experimental results.Therefore,this is a typical imbalanced data classification problem,the imbalance of the data set makes the screening result more inclined to the negative class,thus reducing the accuracy of the screening results.Based on this background,this paper proposes virtual screening of drug protein based on imbalance data classification model.This method combines the virtual screening technology with the imbalanced data classification method to improve the traditional virtual screening process based on molecular docking.Firstly,in the traditional virtual screening process,because of the inaccuracy of scoring function,the molecular docking conformation will be misjudged,which will lead to the screening results of the leading compounds is very low.Inorder to solve this problem,in this paper,Pharm-IF interactive fingerprinting,as the input of the classification algorithm,is used to encode the molecular docking conformation.The one-dimension Pharm-IF is used to represent intermolecular interaction,which not only replaces the scoring function,but also is propitious to sampling and classification of data sets.Secondly,in the actual virtual screening process,the proportion of non active compounds is high,and a large number of incorrect docking conformations cause data imbalance.Considering the characteristics of imbalanced data,such as data flooding phenomenon caused by the inclined classification interface,the dificient information of minority class data,effective information loss after samplimg and other factors,these will lead to lower quality screening of lead compounds.In order to solve the above problems,in the aspect of data processing,the method of cluster boundary sampling is adopted.While the imbalance ratio is reduced,more effective information is preserved as much as possible,which contributes to improve the generalization performance of the classifier.At the level of classification algorithm,ensemble learning is introduced.In this paper,multiple weak classifiers are transformed into strong classifiers by means of multi layer iteration.In this way,not only the stability of the classifier is improved,but also the traditional virtual screening process is optimized.Lastly,in the experimental construction and analysis part,the PDB database and St ARLITe database are used to verify the validity of the proposed method.The experimental results show that the method proposed in this paper can effectively improve the accuracy of virtual screening,and it has a certain guiding role in the development of new drugs.
Keywords/Search Tags:Virtual screening, Machine learning, Cluster sampling, Ensemble learning
PDF Full Text Request
Related items