Font Size: a A A

Research On The Prediction Of Drug-Target Interactions Based On Machine Learning

Posted on:2019-05-27Degree:DoctorType:Dissertation
Country:ChinaCandidate:L WangFull Text:PDF
GTID:1364330566463036Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Drug target recognition is the key of modern drug research and development.It plays an important role in the research of drug side effects,the new use of old drugs and the individualized treatment.However,due to the constraints of precision,throughput,and cost,traditional drug target identification methods based on biological experiments are often difficult to develop.At the same time,with the rapid development of information science,intelligent computing technologies such as machine learning,pattern recognition,and data mining have been widely applied in the field of biocomputing.Driven by these technologies,computer-aided drug-target interaction prediction methods have attracted more and more attention as a fast and accurate method for drug target recognition.It can use computer simulations,calculations and prediction techniques to study the relationship between drug compound molecules and target proteins,and guide the synthesis of new drugs or modify known drug structures,thereby shortening the development time of new drugs,reducing the blindness of new drug development and reducing research and development costs.Therefore,as an efficient and low-cost method,the prediction of drug target interaction based on intelligent computing is of great significance for target protein recognition,targeted drug development and drug target interaction network construction.Based on the molecular fingerprint information and protein amino acid sequence information of drug compounds,this paper proposes a set of methodologies for the numerical representation of drug compounds and protein sequences,the objective extraction of feature information,and the parallel prediction of drug-target interactions.The specific research contents are as follows:1.Study the numerical characterization of drug compounds and protein sequence information.The molecular structure information of the drug compound and the amino acid sequence information of the protein are usually stored in a database in the form of characters and are not suitable for direct processing by the computer intelligence algorithm.How to effectively represent the molecular structure information of pharmaceutical compounds and the amino acid sequence information of proteins without losing their biological properties and constructing a representative numerical representation method that feature extraction algorithms can handle will directly affect the accuracy and performance of drug-target interactions prediction.Therefore,this paper proposes a molecular fingerprinting-based molecular structure of a drug compound and a matrix-based low-rank representation of protein amino acid sequence quantification methods to quantitatively describe the intrinsic nature of the drug-target data,which provides a guarantee for the subsequent machine learning algorithm to effectively extract feature information.2.Study the objectively extraction of drug-target data features.When predicting drug-target interactions,determining the features used for training,learning,and classification is a very important aspect.For numerically targeted drug target information,how to extract representative features efficiently and objectively,and minimize the feature dimension as much as possible,will greatly help to improve the accuracy and speed of prediction.Therefore,this paper proposes a method based on machine learning of the drug-target interaction feature extraction algorithm to extract the feature of advanced abstraction automatically and objectively with the reconstruction error minimization,so as to obtain the features of drug target optimal data representation,to provide protection for the subsequent classification model of high precision prediction of drug-target interaction.3.Study drug-target feature classifier model.There is a large amount of high-dimensional data in the drug-target feature information,which often cause dimensional disaster in the classification prediction.On the one hand,the existence of high-dimensional data will increase the computational complexity and the burden of data processing,which will have a negative impact on classification prediction.On the other hand,high-dimensional data is usually sparse and contains a lot of redundancy and even noise information,which will lead to incorrect classification results.Therefore,this paper proposes a rotary forest classifier model based on weight selection,which can effectively reduce the data dimension and remove the redundant information in the data,so as to improve the accuracy and speed of classification model.4.Study large scale drug-target interaction prediction model.Traditional drug-target interaction prediction models usually use only a single classifier and sample single-mode features for classification.This model is difficult to obtain good classification effect and calculation speed for large-scale and high-redundancy samples.Therefore,this paper proposes a drug-target interaction prediction model based on an ensemble learning system.This model uses a set of base classifiers to learn different feature data,and integrates the prediction results of each classifier using a specific integration strategy.This can greatly improve the prediction speed of drug-target interactions on the premise of ensuring accuracy,and achieve better predictability and generalization capabilities.
Keywords/Search Tags:Drug-Target, Molecular fingerprint, Protein sequence, Machine learning, Rotation forest
PDF Full Text Request
Related items