Font Size: a A A

Research On Ligand-based Virtual Screening Methods For Targeting GPCRs Through Sparse Learning And Deep Learning

Posted on:2020-05-20Degree:MasterType:Thesis
Country:ChinaCandidate:B LiuFull Text:PDF
GTID:2428330590995674Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet technology,various types of data tend to be more highly standardized and more quantified.This massive high-dimensional feature brings unprecedented challenges to data mining and machine learning algorithms in practical applications such as natural language processing,computer vision,and genetic engineering.The massive data and high-dimensional features of the era of big data inevitably bring about the problem of“dimensionality disaster”and“over-fitting”,which not only reduces the performance of machine learning algorithms,but also brings the time complexity of exponential growth to the calculation process.Space complexity.Therefore,the rapid and effective feature screening is an important task in data processing.If various methods can be used to remove redundant and irrelevant features from the original data set,and obtain the most representative and most research-oriented subset of data features,and only build the model on key features,the generalization performance of the model can be improved.Significantly reduce the time complexity and space complexity required for computing.We designed a new method,SED,fusing screening for Lasso of long extended-connectivity fingerprints(ECFPs)and deep neural nets,was designed to predict bioactivities of ligand molecules and recognize key substructures acting with GPCRs.The flow of SED is composed of three successive steps:1)representation of long ECFPs for ligand molecules,2)feature selection by screening for Lasso of ECFPs,and 3)bioactivity prediction through a deep neural network regression model.The outstanding characteristic of our approach is that the model can explore accurate substructures from long ECFPs and improve the predictive performance significantly.The SED method was assessed on a series of sixteen GPCR datasets that cover most subfamilies of human GPCRs,where each has 300–5000 ligand associations.The results show that SED achieves excellent performance in predicting and interpreting ligand bioactivities,especially for GPCR datasets without sufficient ligand associations that reach the average improvement on 12%in correlation coefficient(r~2)and 19%in root mean square error against the baseline predictors.
Keywords/Search Tags:Feature Selection, LASSO, Deep Learning, Virtual Screening
PDF Full Text Request
Related items