Font Size: a A A

The Practlce Of Hadoop Job PreParing Optimization

Posted on:2013-01-22Degree:MasterType:Thesis
Country:ChinaCandidate:Q WangFull Text:PDF
GTID:2218330371978802Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The article addressed my work experience about one project for optimization of HADOOP Job PreParing in Baidu Company. The emphasis of the project is to optimize time-consuming and memory usage on the Split course of job preparing. The step is the most time-consuming step and the most important step in job preparing. Because it directly affects how to split the input data of the job that is what the last count of map task is. In terms of the Hadoop version in Baidu and Apache Community, hardly did they revise and optimize the code of split course. Along with the increasing data as input in one job and more and more the number of files, too much memory usage and high time-consuming in split course is exposed. The two problems had already worsened the productivity of Baidu Hadoop cluster, and botherd the users of Baidu Hadoop. Hence, in order to improve the capability of Hadoop cluster and satisified the user, the optimization of split couse is indispensable.The split course optimization can be divided into four parts. They are optimization of getBlockLocation, the optimization when ls operation matches files in the middle of input path, the optimization of memory usage in splitting and the optimization of implanting split course to TaskTracker.Now the finished project had made a big difference in Baidu Hadoop cluster. All the performance tests were passed, that contented the user and leader. The project meets successfully its prospective goals.
Keywords/Search Tags:Hadoop, Split, Memory Usage, Time-Consuming, Optimization
PDF Full Text Request
Related items