Research On Fault-tolerant Strategy Optimization For FLONK Stream Processing Framework

Posted on:2020-05-06

Degree:Master

Type:Thesis

Country:China

Candidate:X Qing

Full Text:PDF

GTID:2428330590474464

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the development of big data and Internet of Things technology,a large number of real-time applications have emerged in the market.This kind of application requires data should be collected,processed and analyzed in real time,and then the results of data processing can be delivered in real time with sub-second delay.Stream computing is a new computing paradigm for real-time computing.Stream applications usually run uninterruptedly.It is unavoidable to encounter various faults during running periods,especially in large-scale distributed environments.Therefore,fault-tolerant recovery in stream computing has always been a research hotspot in this field.Traditional fault-tolerant strategies for streaming applications mainly include active backup,passive backup,upstream backup and rollback recovery based on checkpoints.Each fault-tolerant method has its own advantages and disadvantages.Flink,a stream processing framework,implements a lightweight asynchronous checkpoint based on the barrier model.However,there are still some shortcomings to be optimized and improved during the use of flink.Firstly,flink supports only fixed interval checkpoints.Checkpoint interval is a significant parameter affecting faulttolerant overhead and recovery time.If the checkpoint interval can be adjusted according to the dynamic changes of stream data,the system operation efficiency will be greatly improved.Secondly,flink only provides checkpoint-based fault tolerance mechanisms.For those stream applications with high reliability requirements,a single checkpoint based recovery mechanism is difficult to satisfy the requirements of fast recoveries of applications.In order to solve above two problems,two optimization strategies are proposed in this paper.One is checkpoint interval optimization model.Based on open-loop Jackson queuing network,this paper proposes a delay model for application processing and a fault recovery model of checkpoints,and proposes an optimization method for checkpoint interval based on above model.The experimental results show that the performance model in this paper can well fit the actual operation effect of flink system,and can recommend the optimized checkpoint interval according to the system reliability related indicators.The second one is the optimization strategy of partly active backup for critical tasks.From the point of view of job topology,this paper uses network connectivity analysis and improved PageRank algorithm to rank task according to their criticalities.On the basis of critical path analysis,the first N key tasks under resource constraints are identified and backed up actively,which further improves the reliability of the system.The experimental results demonstrate that the partly active backup method proposed in this paper can make full use of the spare resources of the system and ensure the fast recovery of critical tasks,thus improving the overall reliability of the application.

Keywords/Search Tags:

stream computing, checkpoint interval, queueing model, partly active backup, critical tasks, Flink

PDF Full Text Request

Related items

1	Dynamic Adaptive Checkpoint Mechanism For Streaming Applications Based On Reinforcement Learning
2	Design And Implementation Of Stream Computing Platform Based On Flink
3	Research On Fault Tolerance In Distributed Stream Data Processing
4	Research On Resource Scheduling Method Based On Flink Framework Of Computing On Data Stream
5	Design And Implementation Of A Streaming SQL Real-time Computing Platform Based On Apache Flink
6	The Research And Implementation Of Checkpoint Technology Based On WinNT
7	Design And Implementation Of Distributed Real-time Video Target Tracking System Based On Stream Computing
8	Real-time Data Stream Clustering Processing System Research And Implementation Of Reliable Backup Solution
9	Research On Elastic Resource Scheduling Strategy For Big Data Stream Computing
10	The Study Of Self-Adaptive Fault-Tolerant Scheduling For Real-Time Tasks On Cluster Computing