Font Size: a A A

Statistical Learning-based Forecasting Method For Data-intensive Mapreduce Program Execution Time

Posted on:2019-03-25Degree:MasterType:Thesis
Country:ChinaCandidate:H R ZhangFull Text:PDF
GTID:2428330566998104Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
More and more Internet companies rely on large-scale data analysis as part of their core services,such as log analysis,feature extraction,or data filtering tasks.Through its Hadoop implementation,the Map Reduce model has proven to be an effective model for processing such data.One of the key challenges in performing this type of analysis is to predict the execution time of individual jobs.This is of great significance for the management of resources and the scheduling of progress.However,in order to solve the complex problems of Map Reduce model program,the model's program will become complex and diverse.How to predict the program execution time in a complex environment is the difficulty of this problem.To this end,we adopt the practice of limiting the problem to specific conditions and dismantling it.We constrain the program application type to data intensive programs and take into account the generality of the model,ignoring the situation where expert operators influence performance by changing parameters.Data-intensive program execution is characterized by less CPU computation,more I/O time,and more time complexity of the algorithm.We divide the application scenario into three categories:(1)One is that a certain application does not change,only the scene of the data flow is changed.This article uses the KCCA model to make predictions.The model can use very few features to accurately predict the time of program execution.However,we need to summarize training methods from the derivation process of KCCA.Moreover,the characteristics of the model input are closely related to the program types,and are not suitable for promotion.They are only suitable for a single type of program prediction.(2)In order to deal with more complex and diverse scenarios,the idea of a baseline prediction model was proposed.Through the analysis of the Map Reduce model execution process and intermediate results,according to different characteristics of different stages,it proposes to use different forecasting models to predict.At the same time,the model is integrated based on the ensemble learning method.Finally,it is verified that the model is very good for the same type of program.The prediction effect.(3)Then,the concept of meta-operation was put forward,and from the theoretical basis of the algorithm stipulation,the meta-operation was reduced to a complex algorithm,and two prediction methods adapted to different situations were given.One is an empirically based prediction method.This method works well on small data sets.The other is a prediction method based on pre-execution of sampling,which is more suitable for large data sets.
Keywords/Search Tags:time prediction, feature selection, KCCA model, double layer model, reduction
PDF Full Text Request
Related items