Font Size: a A A

Design And Implementation Of Mass Data Storage And Result Reduction For Virtual Screening Based On Hadoop

Posted on:2013-10-22Degree:MasterType:Thesis
Country:ChinaCandidate:D W ChenFull Text:PDF
GTID:2248330371987273Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Traditional virtual screening and virtual screening based on grid need chemists to upload small molecule files and collect result data manually. However, it is difficult to achieve docking and collecting result data automatically, which adds workload on chemists.This thesis is based on the major project of the National Natural Science Foundation,"Research and Demonstration Applications of E-Science Computational on Chemistry" and "Research on Large-Scare Virtual Screening of Anti-H5N1Drugs within the Internet Environment". Taking advantage of the mass data storage of Hadoop platform, we designed and implemented virtual screening’s mass data storage and result processing to achieve virtual screening’s automation. The research work mainly includes the following two parts. Firstly, we built a Hadoop platform which achieve massive data storage and job management. The platform meets the demand of the massive data management of massive original small molecules data, docking results and jobs for large-scale virtual screening. Secondly, we used Mapreduce programming framework to achieve parallel molecular docking and pretreatment of the results files.The research work in the thesis builds a platform for researching large-scale drug virtual screening and mass data storage, which makes a contribution to drug discovery in the cloud environment. Furthermore, our research work will promote the development of e-science on chemistry.
Keywords/Search Tags:Hadoop, Large-Scale Virtual Screening, HDFS, MapReduce
PDF Full Text Request
Related items