Font Size: a A A

Research On The Performance Modeling Of Spark Streaming

Posted on:2018-09-17Degree:MasterType:Thesis
Country:ChinaCandidate:Y HouFull Text:PDF
GTID:2348330563452518Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Spark Streaming is the cutting-edge system of big data batch stream computing system.Its core technological characteristics are that it segments its received stream data into several small batches based on time sequence and it uses batched-based computing technology to process data periodically,so as to gain the data process response near real-time and higher data throughput.The existing researches to the performance analysis of Spark Streaming system mainly focus on employing the method of observation to conduct performance monitoring and results analysis with the overall or partial components.However,Spark Streaming is a complex system composed by multiple phases and multiple components of data receiving,storage and computing etc.,and the data transfer and performance dependence relations exist in the phases.The practical application of existing performance analysis method to productive platform unveils its two flawsfirstly,the accurate system performance bottleneck can not be determinated based on the data dependence relations of components;secondly,the fast and quantitatively system performance evaluation can not be conducted against the fluctuation of the stream data load.Performance modeling is an important method of computer system performance analysis and it employs mathematical theory and methods study to explain the relations among performance,system and load,and is a major method to conduct quantitative analysis toward the performance of computing system.To our best knowledge,the performance modelling works based on Spark Streaming are still black.As a result,the thesis promoted the Spark Streaming performance modeling technology base on Queuing Theory,and acquire the quantitative system performance under different data load strengths through mathematical analysis to the model,so as to guide the system performance optimization.The major contributions of the thesis are as follows:(1)The Spark Streaming quantitative performance model based on Queuing Theory is established.The phase division toward the data process flow of Spark Streaming system is conducted.Based on Queuing Theory,the reasonable hypothesis is conducted on the data arrival and service characteristics of different phases and the working principles of different phases are analyzed.The meanwhile,the corresponding queue model is selected,and then the computing method of data response time is provided.(2)The computing methods of the model parameters and the simplification methods of the performance model are designed.The parameters acquisition and computing methods of the established performance model are defined.At the same time,given the great cost caused by model parameters acquisition,the model simplification methods based on selection of key components are prompted-selecting the key components with larger sojourn time proportion and random variation characteristics to decrease the complexity of the model and the performance invasion to the system.(3)The accuracy of the model is verified and the model is employed to guide the system performance optimization.The Spark Streaming stream load is employed to conduct the evaluation test.The test results show that for 90% conditions,the error between the model computed data response time and the practical measured data response time is lesser than 8%.In addition,the performance model is used to conduct online system performance bottleneck decision toward the Spark Streaming and corresponding optimization strategies are prompted.The test results show that,though model guiding,the system's data response time is reduced 11.20% by average and 15.88% by the maxinum.
Keywords/Search Tags:big data, batched streaming computing system, Spark Streaming, performance modelling, queuing theory
PDF Full Text Request
Related items