Font Size: a A A

Design And Implementation Of MapReduce Framework Optimization Scheme Based On OpenStack Platform

Posted on:2016-02-03Degree:MasterType:Thesis
Country:ChinaCandidate:Q F XuFull Text:PDF
GTID:2208330470466512Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
Cloud-based big data processing cluster is rapidly adopted in industry, owing to the advantages of the elastic service framework and the "pay-as-you-go" business model. With the widely use of cloud-based MapReduce frameworks, an increasing number of solutions have been proposed to optimize the performance of the cluster. Majority of the existing studies concentrate on optimizing the task scheduling or resource provisioning mechanisms to improve the platform’s data processing or communication performance separately, without an overall consideration. This thesis designs and implements a novel virtual network performance optimization strategy for cloud-based big data processing, which takes both the data processing and communication performance into consideration and changes the topology of virtual networks to optimize MapReduce performance.Our strategy optimizes the virtual network topology based on the multi-host deployment mechanism provided in OpenStack Neutron release Grizzly and later. Specifically, this thesis determines the optimal number of communication agents, the optimal location of each agent and the optimal matching between VMs and agents.(1) Research the network agent mechanism in the cloud platform is how to affects the performance of the cloud platform for large data processing applications. The study found the number of network communications agency, location specific correspondence between the three issues needs to be addressed in this paper.(2) How to get optimal the number of network agents in the cloud platform. I formulate the detailed operating procedure of a cloud-based MapReduce workflow in a multi-host virtual network, and build a multi-objective performance optimization model to get the optimal agent number in the cloud.(3) Research network communication agent specific deployment location.Using the Knapsack to determine the optimal location of the network deployment agent on the platform. To solve the problem to optimize communication performance between virtual machines and upgrade the virtual cluster data transmission capability.(4) The correspondence between the agent and the study of network traffic between virtual machines LoadBalancing algorithms to determine the optimal location and matching separately. And must be able to optimize the solution of the problem to calculate the performance of virtual machines, virtual clusters enhance the ability of data processing.(5) Implement the strategy (named TOMON) using python and integrate TOMON into Neutron as a separate plug-in to realize automatic deployment of virtual networks.(6) Through the design and completion of Haddoop virtual cluster of massive data processing experiments,the ultimate proof of this thesis optimization significantly improved the virtual cluster large data processing capabilities. Under the same size of the job is completed scenario, the time it takes the cluster optimized significantly reduced.In this paper, the virtual network performance optimization strategy has been implemented and tested; experimental results showed that our strategy is effective and efficient. The implementation of this strategy benefits for the research of virtual networks in cloud platform.
Keywords/Search Tags:Cloud Computing, OpenStack, Virtual Network, Hadoop, MapReduce
PDF Full Text Request
Related items