Font Size: a A A

Research On Data Management And Parallel Docking In Virtual Screening Based On Hadoop

Posted on:2014-01-22Degree:MasterType:Thesis
Country:ChinaCandidate:J W LiFull Text:PDF
GTID:2248330398468919Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
Virtual screening technology is used for drug discovering process by using computers, which is to predict activity of compounds, and take potential compounds to est. Molecular docking is an important method in virtual screening. With the development of structural biology, the number of compounds and proteins are increasing, virtual screening application are facing the dual challenges of mass data storage and large-scale computation. Cloud computing technology provides a new thinking and method for us to solve mass data storage and computing problems in virtual screening.Apache Hadoop is an open and mature source software framework. Open source data warehouse-Hive and parallel programming model-MapReduce are the two mainly technologies of Hadoop, which can provide storage and computing for massive data application. In this thesis, using Hadoop and Hive technology, we mainly research on solving the key problems of massive data query and management, and huge amounts of molecules parallel docking. The contents of this thesis include:1. Building a cloud database for large-scale virtual screening based on Hive technology, providing query and analysis functions, and optimizing the database from the map/reduce task numbers and HQL language.2. Using MapReduce framework to realize parallel function of molecular docking, researching on the problems of Dock6software called in Hadoop, designing map() to distribute molecules docking tasks, and designing reduce() to combine the map outputs.3. Testing the virtual screening cloud database by data loading, combination querying, multi-table join querying, sorting querying and so on.The research work in this thesis provides a demonstration for virtual screening applications based on cloud computing technology.
Keywords/Search Tags:Cloud computing, large-scale virtual screening, molecular docking, Hadoop, Hive, MapReduce
PDF Full Text Request
Related items