Font Size: a A A

Research On Spark Optimization Based On Fine-grained Monitoring

Posted on:2017-03-22Degree:MasterType:Thesis
Country:ChinaCandidate:H M KangFull Text:PDF
GTID:2308330503987189Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Spark has good fault tolerance and scalability. And it has been widely used in industry. But due to the spark platform is transparent to the user, and performance optimization needs comprehensive consideration of a variety of factors, Spark optimization becomes very complex. Inexperienced spark users often do not know how to start. One feature of Spark is to support cloud services such as EMR Amazon, which has a great benefit for small and medium enterprises that need to process large data. Therefore, running Spark program on the cloud service has become a good choice. EMR Amazon provides great convenience for users who need to run Spark program. In order to use rental service efficiently, the user needs to apply for the optimal allocation of resources for reduceing the cost of leasing. However, service providers do not meet the demand for such a service. In this way, the optimization of the allocation of cluster resources has become the user’s responsibility, and this is a great challenge for Spark users. Furthermore, it is an urgent problem to be solved.In view of the above problems, this paper designed a fine-grained monitoring tool, and the main parts of the paper are included:(1) Research on the major factors which impact on the performance of spark, and this paper studied the performance of spark optimization, combining with the cluster resources and spark historical operation data. The optimization including: data serialization, Shuffle manager, data compression, resource scheduling, and file system strategy. The optimization target is to improve the cluster resource utilization rate, and reduce the execution time of a job.(2) The performance of the Spark program is modeled to be used to predict the execution time of the Spark job, and then to optimize the allocation of Spark cluster resources. The optimization target is meeting the application requirements in the premise of applying the least cluster resources.In this paper, a lot of experiments have been made to optimize the Spark program, and to verify the accuracy of the Spark resource allocation optimization model in detail. The experimental results show that the model is suit to a variety of types of Spark program, including text processing, machine learning and graph computation. It is not only helpful to the user for applying reasonable cluster resources, but also meaningful to the service provider for optimizing cluster resource allocation.
Keywords/Search Tags:Spark, Performance Monitoring, Performance Optimization, Execution Time Prediction, Resource Allocation Optimization
PDF Full Text Request
Related items