Font Size: a A A

The Research And Implementation Of Big Data Cloud System Based On Hadoop

Posted on:2017-10-31Degree:MasterType:Thesis
Country:ChinaCandidate:M F ChenFull Text:PDF
GTID:2348330518995589Subject:Information security
Abstract/Summary:PDF Full Text Request
As the Internet develops rapidly,large data generates continuously.Com-panies also face the problem of how to store and analyze large amounts of data efficiently.The Apache Hadoop is the most popular solution in the industry to store and process extremely large datasets.But there are still some prob-lems to solve in terms of storage scalability,application scenarios,deployment efficiency and resource utilization.This paper analyses the relevant key technologies about Hadoop,and fi-nally choose the GlusterFS distributed file system and OpenStack cloud com-puting platform to design and implement the big data cloud system based on Hadoop.The main work of this paper includes the following aspects:(1)We make further study about the relevant key technologies in terms with Hadoop,including HDFS,GlusterFS,MapReduce and OpenStack from system architecture to implementation.(2)Due to the problem of single point of failure and application scenarios limitation,we use GlusterFS to replace HDFS as the backend storage system of Hadoop,and we present two different architectures for Hadoop MapReduce enablement on GlusterFS,then evaluate each architecture’s performance and make a comparison with the original architecture based on HDFS+MapReduce.(3)In order to improve the deployment efficiency and resource utilization of Hadoop,we deploy the Hadoop virtual machines based on OpenStack,design and implement the dynamic scheduling about Hadoop virtual machines.The big data cloud system based on Hadoop improves the storage scala-bility,application scenarios,deployment efficiency and resource utilization.
Keywords/Search Tags:Hadoop, GlusterFS, OpenStack, Dynamic Scheduling
PDF Full Text Request
Related items