Font Size: a A A

Research On Resource Dynamic Allocation Technology On Spark Data Processing Framework

Posted on:2017-08-04Degree:MasterType:Thesis
Country:ChinaCandidate:M M YangFull Text:PDF
GTID:2348330503492922Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Spark, the large-scale data analytic framework, is a cutting-edge platform for the massive data processing. Spark platform proposes a new data expression model called RDD, and also introduces in-memory computing engine and data multiplexing technology, finally adopts the in-memory RDD data storage and processing mechanism, which speeds up the execution of big data application and makes the big data manipulation easier.Currently, dynamic allocation technology of Spark platform is coarse-grained, and only allocate with CPU resource, which lacks of resource adjustment with level of task container, thus this dynamic allocation technology cannot be realize accurate when dealing with the scenario that the resource requirement among task asymmetric. To solve this problem, we propose collaborative resource dynamic allocation technology considering into both with CPU and memory. The key points of this technology, is scheduling resouce with granularity of task executor, and dynamically adjusting allocation quantity of CPU and memory according to usage characteristics of CPU and memory of task within task executor. Through this combined and optimized resource scheduling among multiple task executor, we could make full use of resource utilization of CPU and memory, finally promote throughput of applications on Spark platform. The main contribution of this thesis are as the follows:1) Propose of the definition of equilibrium saturation of resource usage within task executor. Equilibrium saturation of resource usage is for evaluation for the quantitative measurement of the usage efficiency of the task executor, and is the basis of resource allocating. Equilibrium saturation of resource usage comprehensively describe the utilization of CPU and memory resources and also the gap between the two types of resource utilization. On Spark platform, higher Equilibrium saturation of resource usage means better utilization of CPU and memory resources of task executor.2) Decision making model of resource dynamic adjustment based on the evaluation of equilibrium saturation of resource usage within task executor. We define the trigger condition of resource dynamic adjustment according to equilibrium saturation of resource usage. This decision making model aim at these task executors those who have requirements of resource dynamic adjustment. And afterwards we degisn resource adjusting strategy with three level, which including adjustment of task parallelism, requirement of CPU resource, resource reallocation of task executor, finally getting goal of less resource fragment generated.3) Dynamic resource allocation method based on ant colony algorithm. This method combines the resource requirments of task executors and current available resource on the platform, to maximize usage of resource on the platdform. There exists three kinds of resource demonds in resource allocation of task executor, increasing CPU resource to existing task executor, restarting task executor, and adding new task executor. We identify task executor as ant in ant colony algorithm, and map combination and allocation of resource to revenue function. Consequently we implement the resource allocation method based on ant colony algorithm, which could recognize these three kinds of resource demonds mentioned above, and schedule CPU and memory resource collaboratively, finally to maximize resource allocations.4) Summarized the research result in this article, a resource dynamic allocation prototype system based on evaluating of equilibrium saturation of resource usage within task executor and built on Spark and Mesos, called DRSpark, is proposed. DRSpark is integrated with such technology like evaluating of equilibrium saturation of resource usage and resource dynamic allocation.5) Performance evaluation of DRSpark. We use mixed workload that stimulating production environment for performance evaluation. Performance results show that, compared with the Standalone mode, YARN mode and Mesos mode, task throughput of DRSpark is increased by 71.14%, with an average of 32.48%. The application average turnaround time was shortened by 37.64%, and the average time was shortened by 23.71%.
Keywords/Search Tags:Big Data, Spark, Resource dynamic allocation, ACO, Distributed In-memory Computing
PDF Full Text Request
Related items