Font Size: a A A

Researches On Optimization Of Resource Allocation For MapReduce Scheduling

Posted on:2014-04-23Degree:DoctorType:Dissertation
Country:ChinaCandidate:H W HanFull Text:PDF
GTID:1268330425476714Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The job frequency and data density increase continuously in big data processing platform,together with the platform resources. To achieve the excellent carrying capacity in big dataprocessing platform, it is important to allocate the platform resources properly among big datacomputation jobs in the complicated execution and concurrent scheduling process. Theexisting research on big data processing technology about data-oriented parallel programmingmodel pays more attention on the implementation of computation job’s parallelism executionthan on the different resource demand of different users and different computation jobexecution processes, where hide a huge opportunity of resource utilization improvement andbusiness carrying capacity enhancement by optimizing the resources allocation amongdifferent computation jobs and different computation job execution processes.The resource allocation optimization of big data processing platform is a so brand newresearch scope being developed by the big data application development that the relatedresearch work is still in shortage currently. Targeting at this gap, a complete model of resourceallocation optimization for the emerging big data processing MapReduce framework isproposed according to the in-depth study and creative development of the resource allocationoptimization during the vertical MapReduce computation job execution and horizontalmulti-jobs’ concurrent scheduling process in the big data processing platform. This modeldevelops the existing technology of MapReduce programming model and its supportingsystem by optimizing the resource allocation from both levels including vertical computationjob execution and horizontal jobs concurrent scheduling process to reach the target ofresource utilization improvement and business bearing capacity enhancement in big dataprocessing platform.Specifically, the main contributions of this study are as follows:1. A new concept, computation job execution profile, is proposed in this study to developthe self-adaptive capacity for the dynamic feature in big data processing. Bycomprehensively studying the detailed mechanism of the MapReduce programmingmodel and its support system, the construction and the composed fields of thecomputation job execution profile are formed according to the MapReduce job’s micro-processing execution phases. Afterward, a non-invasive dynamic probe program isdesigned and developed using BTrace technique to trace the actual MapReducecomputation job’s execution procedures during its execution and get the detail executioninformation in granular real-time to count out the result, which is the specific value ofeach profile field.2. With the vertical job execution point, a new adaptive dynamic auto-tuning methodcomposed of three phases including job execution status profiling, job performancepredicting and job performance optimizing (Profile-Predict-Optimize,PPO) is proposed,with the corresponding MapReduce job performance prediction model and theMapReduce job performance optimization model. The MapReduce job performanceprediction model is constructed to predict the MapReduce computation job performanceaccording to the given computation job running profile and computation job resourceallocation plan. And, using the MapReduce job performance prediction model, theMapReduce job performance optimization model could find out the most optimalresource allocation plan by searching the resource allocation plans space effectivelyaccording to the user’s optimization demand. The experiment results show that theperformance prediction model basically could clearly and effectively identify the betteroptimization configuration values, though producing an average of15.1%of thecalculated excess predict task execution time because of the probe overhead. On the basis,the performance optimization model would improve the computation job’s execution timeby average42%, maximum25.7%than the commonly used rule and thumb methods forthe concurrently multiple computation jobs.3. A new adaptive resource-aware dynamic scheduler (Resource-aware Dynamic Scheduler,RDS) for multi tasks concurrently scheduling problem is proposed and constructed. RDSachieves both the different levels of customer satisfaction and the resource utilizationimprovement by sensing the resource usage status timely through a resource placementmatrix of each processor node computing resource scheduling assignment constantlyupdated dynamically and maximizing the total tasks utility through task effectivenessevaluation model based on user QoS requirements. The comprehensive evaluation resultsshow that the RDS scheduler is able to dynamically adjust the platform resources allocation among the concurrently multiple computation jobs under no matter the relaxedlong completion time goal or the crunched completion time goal with the superiorperformance than the Hadoop’s fair scheduler about5-100%completion time reduced forthe multiple computation jobs.
Keywords/Search Tags:MapReduce Programming Model, Profile, Performance Predict, PerformanceOptimization, Task scheduling, Resource-aware
PDF Full Text Request
Related items