Font Size: a A A

A Resource Management Strategy Based On Data Compression Ratio For MapReduce On Hadoop Clusters

Posted on:2019-03-13Degree:MasterType:Thesis
Country:ChinaCandidate:J Y PanFull Text:PDF
GTID:2428330563492478Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
The existing resource management mode of the Mapreduce computing framework on the Hadoop platform has many problems.These problems are gradually highlighted in the process of the Hadoop platform being widely used and continuously improving in various industries.In the actual production environment,the problems existing in the existing resource scheduling management mechanism of Hadoop YARN(Yet Anouther Resource Negotiator)are gradually highlighted.On the one hand,existing resource management mechanisms do not take into account the differences in cluster load diversity and node computing capabilities,and allocate the same fixed resources for different load tasks on different nodes,resulting in waste of resources;on the other hand,YARN is used in The unreasonable design of resource containers for encapsulated resources makes it difficult for cluster users to use containers properly.Based on the problems of MapReduce framework in Hadoop cluster resource management,this paper proposes a MapReduce resource management strategy based on data compression ratio.The resource management strategy uses the data compression ratio as the basic characteristics of the load,collects the load history information generated during the operation of the cluster,uses the machine learning model to predict and analyze the load characteristics,uses the task-level resource scheduler to select the corresponding task-level resource scheduling strategy for the load with different characteristics,and improves resource utilization.At the same time,the time performance of the load is optimized.In addition,the concept of elastic container is introduced in this paper,which uses the monitoring and evaluation of cluster node performance to control the degree of concurrency on the corresponding nodes to avoid performance bottlenecks in some nodes.It also optimizes container resource monitoring,allows occasional overuse of resource containers,and avoids repetitive load and resource waste caused by strict monitoring.This article selects multiple test samples from the bigdata benchmark to perform multiple performance tests on the strategy.The results show that the Hadoop system with the optimized resource management strategy and elastic container can effectively increase the task concurrency of the node and reduce the disk I.The /O request significantly improves the time performance of the load and reduces the waste of cluster resources.The test results of deploying this work on Hadoop 2.7 platform show that the resource management strategy proposed in this paper can improve the response time of the job by 15% to 40%.The utilization of cluster resources increased by 167% while the task concurrency of simultaneous nodes was two to three times that of native Hadoop clusters.
Keywords/Search Tags:heterogeneous clustering, task resource scheduling, node computing capability, flexible container
PDF Full Text Request
Related items