Font Size: a A A

Hadoop Parameters Tuning Method Based On Machine Learning

Posted on:2017-05-25Degree:MasterType:Thesis
Country:ChinaCandidate:Y TongFull Text:PDF
GTID:2348330503472459Subject:Computer technology
Abstract/Summary:PDF Full Text Request
As MapReduce is a relatively new technology, it is not easy to find qualified administrators. Users often experience a performance problem, as they don't know how to set these parameters. Since the parameter configuration space is huge, and there are more than 70 parameters affect job performance, Map Reduce job adjustment of the parameters is a difficult and time-consuming task. Firstly, for Map Reduce applications attribute does not have a good understanding, it is very difficult to determine an appropriate parameter for the user. The existing method of adjustment is slow and inefficient, because they cannot cope with the massive growth parameters and data, while require multiple test runs and a significant amount of human consumption are needed.We designed and implemented a Hadoop job parameters tuning method based on machine learning techniques and the method was effectively verified in the system called HMAT which was set up by this paper. In the condition of Hadoop cluster was highly configurable and hardware resources are fixed, our method used a novel two-stage machine learning to automate auto-configuration of the Hadoop Map Reduce parameter,it also adapted to new ad-hoc job submitted to the cluster. The core technology is a performance model based on support vector machine, this model made Hadoop jobs input data size and configuration parameters for integration. To accommodate ad-hoc jobs, we need to vigorously utilize resources utilization features of the previous job and also need to make allocation configuration decisions. In order to establish performance model, we summarize and analyzed statistical information by running MapReduce job. We predicted the execution time of the program through the performance model; parameter configuration of the job and the size of input data was given. Finally, pattern matching search parameter space to achieve Hadoop parameter automatic tuning function.By typical Hadoop application testing effectiveness of the system, Test results indicate some hadoop job with good performance tuning effect than the default parameters ranging up to more than 18%, Even in training outside the set TeraSort jobs can also increase about 4 to 5 times.
Keywords/Search Tags:Performance Optimization, MapReduce, Machine Learning, Automatic parameter adjustment
PDF Full Text Request
Related items