Font Size: a A A

Research On Drug-target Interaction Method Based On Improved CNN And Unbalanced Data Processing

Posted on:2022-08-18Degree:MasterType:Thesis
Country:ChinaCandidate:Z W YeFull Text:PDF
GTID:2491306317977349Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Modern drug discovery has provided a variety of clinical treatment programs for various diseases,which is of great significance to protect human life and health.Drug discovery,however,is a "costly,time-consuming,low-return" process.Machine learning and artificial intelligence solutions are changing this process by screening known drugs and targets to find new drug-target interaction pairs to accelerate the process of drug discovery.This thesis aims at the problem of unbalanced data in drug-target interaction studies,and proposes an effective unbalanced data processing program BS-DTvec: The low-dimensional feature representation of drug molecule(Drug Vec)and target protein(Target Vec)were obtained respectively through the method of feature extraction of word vector.Drug Vec and Target Vec were combined to represent drug-target interaction features(DTVec).A few samples were synthesized by Borderline-SMOTE technique in order to solve the overfitting problem caused by the imbalance of drug-target interaction data sets.This algorithm for representing words as real-number vectors is very suitable for processing raw text data such as drug molecules and protein sequence information.It takes the raw data as corpus input,and learns the connection between each word segmentation(the word segmentation is a fragment in the sequence,which represents the substructure of drug molecules and protein targets).This connection contains a wealth of information about drug-target interactions.And express this kind of content containing rich information in the form of real vector,and digitize the original non-numerical data,which can facilitate more in-depth algorithm research.On this basis,a convolutional neural network model DT-Net is proposed in this thesis to carry out deep convolution of processed drug-target data,mine the key information of drug-target interaction,and classify and predict the drug-target interaction data.Through experimental verification,the method proposed in this thesis can effectively improve the data distribution and improve the classification performance of the model.
Keywords/Search Tags:imbalanced data processing, drug-target interaction, Borderline-SMOTE, word vector, convolutional neural network
PDF Full Text Request
Related items