Font Size: a A A

Research On Load Optimization Of Mapreduce Resource Scheduling Mechanism In Heterogeneous Environments

Posted on:2018-07-02Degree:MasterType:Thesis
Country:ChinaCandidate:W D CaiFull Text:PDF
GTID:2348330518998078Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In 2006, Google, Amazon and other companies put forward the concept of"cloud computing", through which users can quickly publish and adjust application resources based on traffic load. Recently,the rapid development of virtualization and container technology provides a more reliable and efficient usage of resources for the upper applications. Internet of things and social networks generate a large amount of data, which cannot be well processed by the traditional data processing platform. The birth of Hadoop made it possible to process massive data efficiently. Hadoop has been widely used in engineering, and shows excellent performance. However,MapReduce,as its core computing component, is still faced with load skew in heterogeneous environments, which causes task execution to be inefficient. Although MapReduce has received a large number of scholars' and engineers' attention,there is no effective solution for accurately estimating the real-time machine load and task execution time. Besides, efficient and precise allocation of MapReduce tasks is significant to optimize the load of nodes and achieve the purpose of balancing the load of nodes. This paper presents a load optimization method of MapReduce resource scheduling mechanism in a heterogeneous environment. The specific research contents are as follows:(1) Based on the real-time resource, an adaptively speculative execution (ASE)strategy is proposed. By establishing the hierarchical storage mechanism,the running information of different nodes in the different stages are stored respectively.Combined with the linear regression algorithm, the running time of the current task and that of the backup task are predicted. After the real-time cluster condition is considered, different steps will be taken to speed up the job. Finally, in this paper, the performance of ASE is evaluated in the cluster with different loads.(2) Based on double-layer resource scheduling model, the second-layer scheduling algorithm (SSA) is proposed to improve to optimize the double-layer resource scheduling model of MapReduce. A prediction model based on K-ELM(PMK-ELM) is established, and its prediction accuracy is evaluated. Based on these,the PMK-ELM is added to the scheduling progress of SSA. When the Map task is over, the prediction model will calculate the required execution time when the intermediate data are allocated to the different Reducer. By improving multi-objective optimization algorithm, the purpose of saving the job execution time and the relatively balanced distribution of the results is achieved. Finally, the proposed adaptively speculative execution strategy is applied,and a comprehensive assessment of the job execution time and disk space occupancy ratio of the load optimization program is launched from the aspects of computing and storage to achieve the load optimization.
Keywords/Search Tags:MapReduce, Adaptively Speculative Execution, Second-Layer Scheduling Algorithm, Load Optimization
PDF Full Text Request
Related items