Font Size: a A A

Research On Data Mining Method For Drug-related Information Based On Data Integration

Posted on:2017-02-10Degree:MasterType:Thesis
Country:ChinaCandidate:F S ChenFull Text:PDF
GTID:2308330485469064Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Drug discovery and research is a very long and hard process. Traditional pharmaceutical experiment has long R&D cycle, slow result and high cost. So using computer science methods, especially recent data mining and machine learning methods, to mining and predict drug-related information, is a costly and effective method. Following the theme of data mining method for drug-related information based on data integration, this paper uses drug’s ATC code prediction and drug-pathway interaction prediction as main research aim and investigates the effectiveness of data integration and data mining methods. It has great significance to drug discovery. The main research content is as follows:1% Aim to solve the problem of lacking ATC related feature, we propose dD-Hybrid method based on RandomForest to predict drug’s ATC code. The characteristic feature is that we fully use existing connections between drug and domain and construct drug-domain interaction network. Then we use domain information as new feature and add it to original feature. The combined features are used to predict drug’s ATC code. Experimental results show that domain feature add accuracy to our method largely. Further, some new predicted drug and ATC code pairs can be verified in authority database.2、Aim to solve the problem of positive and negative samples imbalance in the data set and feature variable redundancy, we combine disease information and propose PU-KNN method based on latent semantic analysis to predict drug and pathway interaction. The characteristic feature is that we not only add drug-disease and pathway-disease feature calculated by global correlation method to original feature, but also use latent semantic analysis to reduce dimension and apply PU-KNN method to choose proper samples and assign probability to each sample. Experimental results show that disease features dig connections between drugs and pathways well. In addition, we verify the reliability of new predicted drug-pathway pairs in biological sense.3、Aim to solve low efficiency of model brought by signal learning method, we investigate the results using base classifiers in different ensemble learning methods in drug-pathway prediction problem. It may provide great reference value and research thoughts to drug- pathway interaction prediction problem.
Keywords/Search Tags:data mining, data integration, drug-ATC, sample imbalance, drug-disease-pathway, ensemble learning
PDF Full Text Request
Related items