Font Size: a A A

Research On Feature Screening And Regression Prediction Of Ligand Bioactivities Via Deep Learning

Posted on:2019-03-24Degree:MasterType:Thesis
Country:ChinaCandidate:Q M ZhangFull Text:PDF
GTID:2428330566999275Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
Feature selection and deep learning are significant branch of machine learning.Data with various structures and scales comes from almost every aspect of daily life.To effectively extract patterns in the data and build interpretable models with high prediction accuracy is always desirable.One popular technique to identify important explanatory features is feature selection.Recently deep learning has gained researchers' attention with great success on various machine learning tasks.One of the key points of deep learning is the discovery of features.By discovering relationships in the data set,features can be found more accurately,and by increasing the complexity of the network by adding layers,higher level features,or features concerned less with the structure,and more with the content of the data,can be extracted from the data.The main contribution of this paper is as follows:(1)An effective LASSO screening rule via Enhanced Dual Polytope Projections is proposed to screen massive features for ligands,which is able to detect most inactive features and speeds up screening process.And then,the learning efficiency of the model will be greatly improved because subsequent learning process only need to build a model on a small number of features.The effectiveness of our method is validated by experiment results.(2)We proposed a new method,WDL-RF,using weighted deep learning and random forest,to model the bioactivity of GPCR-associated ligand molecules.The pipeline contains two consecutive stages of molecular fingerprint generation through a new weighted deep learning,and bioactivity calculations with the random forest model;where one uniqueness of the approach is that the model allows end-to-end learning of prediction pipelines.The experiment results show that the performance of our models is optimal in all datasets and evaluation criterions.
Keywords/Search Tags:Feature Selection, Massive Features, Deep Learning, LASSO, Random Forest
PDF Full Text Request
Related items