The Research And Implementation Of Big Data Cloud System Based On Hadoop

Posted on:2017-10-31

Degree:Master

Type:Thesis

Country:China

Candidate:M F Chen

Full Text:PDF

GTID:2348330518995589

Subject:Information security

Abstract/Summary:

PDF Full Text Request

As the Internet develops rapidly,large data generates continuously.Com-panies also face the problem of how to store and analyze large amounts of data efficiently.The Apache Hadoop is the most popular solution in the industry to store and process extremely large datasets.But there are still some prob-lems to solve in terms of storage scalability,application scenarios,deployment efficiency and resource utilization.This paper analyses the relevant key technologies about Hadoop,and fi-nally choose the GlusterFS distributed file system and OpenStack cloud com-puting platform to design and implement the big data cloud system based on Hadoop.The main work of this paper includes the following aspects:(1)We make further study about the relevant key technologies in terms with Hadoop,including HDFS,GlusterFS,MapReduce and OpenStack from system architecture to implementation.(2)Due to the problem of single point of failure and application scenarios limitation,we use GlusterFS to replace HDFS as the backend storage system of Hadoop,and we present two different architectures for Hadoop MapReduce enablement on GlusterFS,then evaluate each architecture’s performance and make a comparison with the original architecture based on HDFS+MapReduce.(3)In order to improve the deployment efficiency and resource utilization of Hadoop,we deploy the Hadoop virtual machines based on OpenStack,design and implement the dynamic scheduling about Hadoop virtual machines.The big data cloud system based on Hadoop improves the storage scala-bility,application scenarios,deployment efficiency and resource utilization.

Keywords/Search Tags:

Hadoop, GlusterFS, OpenStack, Dynamic Scheduling

PDF Full Text Request

Related items

1	Research And Implementation Of Dynamic Resource Scheduling Method Based On OpenStack
2	Research And Implementation Of OpenStack-based Cloud Platform Resource Scheduling Strategies
3	Research On Virtual Machine Resource Scheduling Technology Of OpenStack Cloud Platform
4	Study On Hadoop Resource Scheduling Strategy Based On IaaS Cloud Platform
5	Research On Optimization Technology Of Distributed File System Based On Hadoop
6	Design Of Resource Dynamic Scheduling System Based On OpenStack
7	Research On Dynamic Job Scheduling Based On Hadoop Heterogeneous Cluster
8	Research Of The Cloud Computing Platform And Scheduling Scheme Based On Openstack
9	The Technology Research Of Resource Scheduling Based On OpenStack
10	Research And Construction Of Industry Cloud Computing Platform Based On Openstack