Font Size: a A A

Construction Of Genome Analysis Platform Based On Hadoop

Posted on:2016-09-04Degree:MasterType:Thesis
Country:ChinaCandidate:Y H BaoFull Text:PDF
GTID:2308330482961021Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of high throughput sequencing technology, as well as the increasing processed data of Bioinformatics, mining valuable knowledge and patterns among mass experimental data has become research hotspot in the field of Bioinformatics. Cloud computing is the research hotspot at home and abroad as well as the latest development results in grid computing, parallel computing and distributed computing. With cloud computing, the strong computing power, storage capacity ad infrastructure of the computer cloud can be conveniently extracted by internet. Could computing is capable of efficiently analyzing and solving the problems in processing the mass data in Bioinformatics.This research was mainly focused on cloud computing open source platform Hadoop; as well as the following research based on the storage and processing tools of open source bioinformatics.1. Construction and research of Hadoop basic platformHadoop Common, HDFS of Hadoop and MapReduce computational framework were studied in this research. Cluster and genomic analysis platform of Hadoop was constructed in the Docker virtual container, and cloud computing infrastructure which is scalable, efficient and reliable was grasped.2. Construction of genomic association analysis platform(1) Construction of genome databaseThrough resource integration and standardization, districted file system of Hadoop and genome information resource of livestock with the combination of MySql database were structured.(2) Integration of genome analysis tools based on HadoopThe mostly common applications such as assembly of genome sequences; comparison of genome sequences; whole genome association analysis and sequence analysis were involved. These applications were combined and packaged with open source cloud computing biotechnology tools like Contrail、CloudBurst、Myrna. So that the unified operation and monitoring model based on Hadoop was constructed.(3) Development of Web applicationWith the combination of Java, Java Web and the customers demand for genome analysis, genome database and genome tools were packaged into a Web service which is simple,easy to use and transparent. Therefore, the customers are able to be provided with an operation platform containing the functions of sequencing, complex query, genome comparison and association analysis etc.
Keywords/Search Tags:cloud computing, Hadoop, Bioinformatics
PDF Full Text Request
Related items