Font Size: a A A

Design And Implementation Of A Performance Modeling System On Apache Spark

Posted on:2020-11-06Degree:MasterType:Thesis
Country:ChinaCandidate:S Y GuoFull Text:PDF
GTID:2428330602950541Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the development and widespread application of technologies such as cloud computing and mobile computing,the amount of data generated by the Internet is growing at an exponential rate.Faced with the processing and mining needs of massive data,the industry has gradually developed a number of big data processing technologies and related development frameworks.In response to different usage scenarios,the Spark data processing framework provides hundreds of configuration items.Because the configuration parameters of Spark have a significant impact on the running performance of the application,tuning the Spark configuration is a necessary task.In order to improve the efficiency of the Spark framework,researchers at home and abroad have made efforts in many directions.The research on the optimization method of automatic configuration parameters is still in the exploration stage.The existing automation optimization method has insufficient cost considerations,and it is difficult to apply to the actual work scene,and the optimization effect also has a large room for improvement.Aiming at the above problems,this paper presents a method based on machine learning performance modeling,predicting the execution time of target application under different configuration parameters,and on this basis,realizes the optimization task of configuration parameters.The main idea of the method is to build an application and model database,save a variety of application information and its machine learning model;for target applications that need to predict execution time in various configurations,first find and extract key information from the database for The target application sample data is guided;then the sample data is trained based on a machine learning algorithm to construct a performance prediction model of the target application.The main work of this paper includes:(1)Application execution status monitoring.The target application is executed in a specific running environment,and the resource consumption,data flow direction,Shuffle process and other indicators of each time node are monitored,and detailed record reports are obtained,and the feature variables representing the application are extracted in combination with the representation method of the application features.(2)Model knowledge extraction method.The model based on the statistical learning method describes the correlation between each eigenvalue and the result value,that is,the influence of each feature and its combination on the final running time.This information can be extracted to guide the subsequent sample collection process.(3)Design and implement a performance modeling and optimization system.The whole process from data collection and extraction,feature selection and preprocessing,model training and verification,and persistence is realized.Based on the constructed performance prediction model and using the parameter search algorithm,the predicted configuration model is called through multiple iterations to calculate the recommended configuration parameter values of the target application.This paper tests through simulation experiments on multi-node clusters,and verifies the performance modeling scheme and configuration optimization method.The experimental results show that the scheme of this paper realizes the function of automatic configuration adjustment and optimization.It improves the utilization of system resources without requiring a lot of labor costs,and completes the requirements of Spark configuration optimization tasks.Through experiments,the performance modeling and configuration optimization methods proposed in this paper have 10% to 25% performance improvement compared with the traditional optimization methods.
Keywords/Search Tags:Configuration Optimization, Performance Modeling, Feature Engineering, Apache Spark
PDF Full Text Request
Related items