Font Size: a A A

The Study And Optimization Of Hadoop Framework On High Performance Computer

Posted on:2015-10-04Degree:MasterType:Thesis
Country:ChinaCandidate:L T SunFull Text:PDF
GTID:2348330509960920Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of information technology, there are all kinds of data, the diversity of data structure and the big size of data put forward the serious challenge to the relevant technology. For many applications which process big data, the MapReduce programming framework has a distinct advantage and it has been well applied by many network companies, such as Google and Alibaba. Currently, with the increasing of performance, the high performance computers are widely used in biology, astrophysics and other fields. Therefore, how to effectively deploy and use of the MapReduce programming framework on the existing high performance computing platforms, which has become a hot research issue.In this paper, MapReduce programming framework is successfully deployed on the high performance computer. We propose the optimization methods based on analyzing the problems in I/O processing and task scheduling. The paper mainly focuses on the following studies:(1) We study the theory and technology involved in Hadoop framework for high performance computing platform with a broader understanding of MapReduce programming model and the main I/O process. There will be some problems when we directly deploy the MapReduce programming model on high performance computer, such as the compatibility, the decrease of data localization advantage and the increase of I/O competition. At present, the research mainly focuses on the optimization of the intermediate data network transmission and storage optimization, which has obtained certain effect. In this paper, we combine the present research, and optimize the task scheduling and storage resource management.(2) For the Hadoop platform in high performance computer with storage system based on object, we optimize the shuffle process based on the network memory of the nodes, and constructe the implementation method at the level of task scheduling and file system. We propose an I/O optimization method based on balanced scheduling for MapReduce framework in high performance computer, which processes I/O efficiency problem of the intermediate data and temporary data. We take more reasonable storage strategy by the analyzing the I/O load information of storage nodes, and realize the dynamic load balance of storage system.(3) For the Hadoop platform in high performance computer with multilayer storage architecture, we present a task scheduling optimization strategy of mulitgroup I/O accelerated node by using different amount of intermediate results corresponding to reduce tasks. We put forword an I/O service quality maintenance method which isolate the storage service. It ensures the quality of storage service for the high priority jobs.(4) For the Hadoop platform in high performance computer with storage system based on object and multilayer storage architecture, we test the above optimization methods respectively in the simulation environment. Compared with existing methods, we verify their optimization effect and analyze the experiment results.
Keywords/Search Tags:Map Reduce, high performance computing, Multilayer Storage Architecture, Hadoop
PDF Full Text Request
Related items