Integration Of Hadoop And MongoDB For Massive Data Processing

Posted on:2016-01-09

Degree:Master

Type:Thesis

Country:China

Candidate:Q Zeng

Full Text:PDF

GTID:2428330473464946

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

With the exponential growth in data variety and data volumes,NoSQL technology and MapReduce for scalable parallel analysis have garnered a lot of attention.MongoDB,is a representative NoSQL database,supporting both scalable index and flexible query for Massive Data.While Hadoop,is the most popular open source implementation of a powerful MapReduce framework for parallel computing.In view of this,we are devoted to integrate MongDB and Hadoop into a platform and to build a Mongo-hadoop integrated system based on MongoDB and Hadoop.Our purpose is to take the advantage of MongoDB and Hadoop,to cope with large data better for storage,computing and query.Firstly,this paper introduced the basic framework of Hadoop and MongoDB and took an in-depth research on their working mechanism.We also analyzed the advantages,deficiency and similarity between Hadoop and MongoDB in a comparative way,and then reached two conclusions:one is for data calculation,MongoDB MapReduce has great limitations,which can not meet the calculation and analysis of complex data.The other is for data storage,HDFS as the underlying distributed file system for Hadoop,is designed for high throughput data,but cannot achieve efficient query on the data.To solve the first problem,we realized the connector between MongoDB and Hadoop which called Mongo-Hadoop connector,through this plug-in,Hadoop MapReduce can achieve the data from MongoDB and process it efficiently.In a two nodes experiment,we concluded that the performance of Hadoop MapReduce can achieve 5 times that of MongoDB MapReduce averagely.For the second problem,we realized the integration framework based on Hadoop and MongoDB,and put forward four kinds of different integration schemes to deal with large data processing for different needs.Mongo-Hadoop is the integration of MongoDB and Hadoop,in order to achieve the better compatibility,the cluster deployment and the configuration of parameters is particularly important.In this paper,we analyzed the roles of MongoDB cluster and Hadoop cluster,summed up a strategy for the deployment of the Mongo-Hadoop cluster by considering the node localization,data resource utilization and scalability.At the same time,we researched and tunned some of the parameters which impact the way the Mongo-hadoop works and the overall performance.To get the optimal integration schemes and understand the performance trade-offs of using these two technologies together better,three benchmarks is designed to test the performance of Mongo-hadoop under different scenarios.Experimental results show that,using reasonable integration scheme can obtain a highest performance up to 3 times.Experiments show that,compared with other architectures,if the Mongo-Hadoop use reasonable integration scheme,the performance can improve by 28%,while the occupied nodes only account for 50%.

Keywords/Search Tags:

Integration, MongoDB, Hadoop, Big data, Cluster deployment, Parameters optimization

PDF Full Text Request

Related items

1	The Design And Implementation Of Deployment And Management System For Hadoop Cluster
2	Research And Implementation Of Rapid Deployment And Performance Optimization Technology For Hadoop Platform
3	Web Log Mining Technology Research Based On Hadoop/MongoDB
4	Research On Automatic Optimization Of Hadoop Parameters
5	The Analysis And Optimization Of Hadoop Data Processing Performance On Parallel Computing
6	Research On Key Technologies Of Multi-Source Biodata Integration Based On MongoDB
7	Real-time Performance Monitoring And I/O Performance Optimization Research On Hadoop Cluster
8	Research On Design And Optimization Of Electronic Health Archive Data Storage Based On MongoDB
9	Study On The Robust Optimization Of HADOOP Under The Restriction Of Cluster Computing Efficiency
10	The Research Of Improving Performance Of Hadoop Cluster