Font Size: a A A

Research On Synergistical Configuration Of Parameter And Resource On Spark Streaming

Posted on:2019-09-22Degree:MasterType:Thesis
Country:ChinaCandidate:F LiuFull Text:PDF
GTID:2428330593950086Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Spark Streaming is a batched stream processing system that divides the continuous data stream into discrete datasets and uses batch processing engine Spark for parallel processing.Spark Streaming has been used widely due to its near-realtime data processing guarantee.Spark Streaming deployed in a cloud environment pursues two optimization goals,namely,minimizing end-to-end latency and resource usage costs.In view of the problem that existing related works lack synergistic optimization of the above two goals,this paper proposes a method that configure parameter and resource synergistically,combining the parameter configuration and resource allocation to construct an integrated method that optimizes performance and cost.This method is based on historical log information and uses the tensor completion theory to construct the Spark Streaming performance model periodically,and searches the optimized parameter and resource configuration assignment for different data arrival rates based on the Dynamic Neighborhood Particle Swarm Optimization.The method can support the accurate application performance evaluation and the online real-time acquisition of the optimized solution in the background of sparse samples.The main contributions of this thesis are summarized as follows:(1)A method framework that optimizes parameter configuration and resource allocation synergistically for Spark Streaming is proposed.According to the method framework,the performance modeling and the selection of optimized assignment for parameter and resource are periodically conducted in the background part when an application in Spark Streaming is running.The optimization assignment can be quickly obtained through data arrival rate matching when the load intensity of application changes.(2)A Spark Streaming performance modeling method based on tensor completion is proposed.According to the processing model of Spark Streaming,the key factors that affect performance significantly are selected as the feature dimensions of the tensor model,and the model is solved according to the Tensor Completion principle.This method can achieve accurate prediction under the background of few sample data.(3)A method that selects optimized parameter and resource configuration assignment based on Dynamic Neighborhood Particle Swarm Optimization is proposed.This method takes the minimization of resource usage as the main optimization goal,and takes the minimization of end to end latency as the secondary optimization goal.According to the performance model,the key elements such asparticle,neighborhood selection and fitness function are defined,and the optimized parameter and resource configuration assignment is searched heuristically to improve the efficiency of selection of optimized assignment.(4)A prototype system is implemented and performance evaluation is conducted.The performance evaluation results show that compared with the dynamic parameter configuration method DyBBS,the method proposed in this paper can reduce the average end to end latency of applications in Spark Streaming up to 77.8%.Compared with the dynamic resource allocation method DRA,the method proposed in this paper can reduce the average resource usage of applications in Spark Streaming up to 46.4%.
Keywords/Search Tags:Big data, Spark Streaming, Parameter configuration, Resource allocation, Performance model
PDF Full Text Request
Related items