Font Size: a A A

Research On Load Balancing And Fault Tolerant Mechanism In Big Data Stream Processing System

Posted on:2018-02-21Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y DuanFull Text:PDF
GTID:2348330518461415Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
In the field of financial data analysis,stock exchange,network security,etc.,the situation where streaming data is characterized by "massive" and "high speed" has arisen.These data stream arrive continuously with unpredictable speed,and their values decrease over time.The mature HDFS/MapReduce massive data processing and batch-processing framework cannot handle such data stream in real time very well.The big data stream processing has become more and more important in some application areas.Recently,with the development of stream processing,a lot of distributed stream processing framework with good performance have emerged,such as Spark Streaming,S4,Storm,Samza and so on and so forth.However,these existing data stream processing frameworks are deficient in the load-balancing and fault-tolerant mechanism.As the key techniques in such data stream processing frameworks,load balancing and fault-tolerant mechanism are important factors that determine their processing capability,reliability and stability,and hence have become a research focus.In this paper,we firstly analyze the classification of load-balancing algorithm and their implementations.In present,most of the strategies adopted in stream processing systems are static,which cannot handle the impact of dynamic data stream in system load balancing.To resolve it,we propose a dynamic load-balancing strategy for big data stream processing systems based on prediction of the load value of the cluster nodes.We introduce Ganglia system as the cluster load monitoring system,and analyze the feasibility of making prediction on the load of the cluster nodes.With Grey-Markov model,we make prediction based on the collected historical data of the node load,and make load decision according to the predictions,in order to select the nodes and operators that need to be migrated and implement the migration.We improve the algorithm for migrating nodes selection so that the load balancing performance is improved.In addition,we analyze four commonly used failure recovery strategies.We also analyze the mainstream stream processing systems one by one,i.e.,Spark Streaming,S4,Storm and Samza,with focus on their fault-tolerant mechanism.We point out their drawbacks,and design a fault-tolerant mechanism for the failure of the nodes in stream computing and processing clusters based on the characteristics of streaming data and its processing systems.Through experiments,we also show that the Grey-Markov prediction model applies to the prediction on the load of stream processing clusters.We validate our algorithm by comparing the load before and after using our dynamic load-balancing algorithm.We also compare our algorithm with other algorithms based on two indicators: cluster processing delay and the number of operator migrations.The results show that the performances of stream processing cluster and cluster load balancing are improved.
Keywords/Search Tags:big data, stream processing, load balancing, fault tolerance
PDF Full Text Request
Related items