Font Size: a A A

Study On Hadoop Resource Scheduling Strategy Based On IaaS Cloud Platform

Posted on:2017-05-12Degree:MasterType:Thesis
Country:ChinaCandidate:B X WangFull Text:PDF
GTID:2308330482479278Subject:Electronic Science and Technology
Abstract/Summary:PDF Full Text Request
Cloud computing is one of the hot research areas at home and abroad, which integrates large-scale computing, storage and network resource via network and provides these resource for different users on demand. As an open source framework for distributed system architecture, Hadoop can achieve large-scale data computing and storage, and is usually deployed on server clusters. The deployment of Hadoop in a IaaS cloud environment has many benefits, however, there are significant differences between the environment provided by IaaS cloud platform and traditional physical clusters. This paper focuses on how to deploy Hadoop on private cloud platform, and carry out research on the following three aspects.(1) Considering that Hadoop cannot understand the underlying resource usage of physical hosts in IaaS cloud environment, this paper integrates Hadoop onto IaaS cloud platform and propose a new Dynamic Hadoop Cluster on IaaS architecture (DHCI for short) in order to utilize resource of physical hosts and enhance the scalability of virtual clusters. Besides the original packages of private cloud and Hadoop, this paper introduces three kernel modules:monitoring module, virtual machine management module and scheduling module in DHCI architecture. The monitoring module is responsible for collecting the load information of physical hosts which is used for the resource scheduling of Hadoop. The other two modules are designed for enabling the scaling of cluster flexibly.(2) In DHCI architecture, this paper propose a resource scheduling strategy based on load feedback of physical hosts. Specifically, the load information of physical hosts can be collected by the scheduling module, and these information can be classified. Once Hadoop carry out resource allocation, the resource allocation can be avoided on overburdened physical hosts.(3) To improve flexibility, this paper adopts the separated deployment of computation VMs and storage VMs in DHCI architecture, which brings a negative effect on data locality. To solve this problem, this paper designs a dynamic migration strategy of virtual machine on IaaS cloud platform. Based on the ideal of "mobile computing", this paper migrates computation nodes to different host(s) or rack(s) where the corresponding storage nodes are deployed to satisfy the requirement of data locality. This way can reduce bandwidth consumption and improve the system performance.Finally, this paper chooses open source cloud management platform OpenStack as IaaS platform, and implement the whole system architecture and some strategies. Hadoop performance testing tool Hibench is used to testify the system performance. The results show that under the same workload, the running time of task in DHCI architecture is less than that of in traditional Hadoop cluster. During the process of running task, this paper performs statistical analysis on the load information of physical hosts, the results demonstrate the effectiveness of our solutions that contribute to balance workload. For data locality of Hadoop, the running time of task in DHCI architecture with data locality is less than that of in DHCI architecture without data locality.
Keywords/Search Tags:cloud computing, Hadoop, resource scheduling, load balance, elastical scalability, OpenStack
PDF Full Text Request
Related items