Font Size: a A A

Real-time Data Stream Clustering Processing System Research And Implementation Of Reliable Backup Solution

Posted on:2013-06-03Degree:MasterType:Thesis
Country:ChinaCandidate:M J LvFull Text:PDF
GTID:2248330374986410Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Recently, there have been significant interests in applications where the data is informs of high-volume and continuous data stream. Such applications include financemarket monitoring, network monitoring, mobile objects tracking, asset tracking,intrusion detection and ecosystem monitoring. Since these applications monitorreal-time events, the value of a result decays rapidly over time. Therefore, low-latencyprocessing is a key requirement. Stream processing systems enable efficientimplementation of the aforementioned applications. Currently, many data streamprocessing systems are geared toward cluster processing because a large number ofapplications inherently involve geographically dispersed data sources and theprocessing capability of system improves as more servers are used. However, the morecomputation and communication resources, the higher the odds of failure. In streamprocessing, a failure prevents low-latency processing because it blocks the flow of datastreams. What’s worse, it may also result in losing data essential to produce correctresults. Therefore,reliable backup in data stream cluster processing systems is one ofhot spots and difficulties, at the same time,it is also a challenging topics for datastream system.This paper considers a novel checkpoint-based high-reliability solution that meetsthe needs of high reliability of data stream cluster processing system through a parallelbackup and recovery approach. We first discuss our basic recovery approaches, whilecomparing them in terms of recovery speed, CPU and network utilization, as well astheir relationship to various recovery semantics. Then, using parallel recovery mode incluster processing system, we propose the strategies of checkpoint unit formationwhich is priority based on the load of operators, and the algorithm of backupredistribution. From the global perspective, checkpoint units on each server are backedup on different servers and thus can be recovered in parallel. Finally, at the stage ofsystem running, we propose dynamic adaptive scheduling algorithm which is based onthe splitting and binding policy of checkpoint units. Through the right kind of scheduling algorithm, we can minimize system recovery time, reduce processingdelays caused by fault and improve system performance.In summary, this dissertation proposes a more efficient solution on the basis ofexisting solution for the key issues of reliable backup scheme in data stream clusterprocessing system, and makes a comprehensive and detailed analysis on the policiesand overheads in backup and recovery. It will promote the research of backup schemefor data stream system on both theoretical study and practical applications....
Keywords/Search Tags:Data stream, Cluster technology, Checkpoint technology, Backupdistribution, Scheduling algorithm
PDF Full Text Request
Related items