Font Size: a A A

Design And Implementation Of The Distributed Cache System Oriented To Time-Series Data Streams

Posted on:2015-09-13Degree:MasterType:Thesis
Country:ChinaCandidate:S HuaFull Text:PDF
GTID:2298330422977163Subject:Software engineering
Abstract/Summary:PDF Full Text Request
At the Information age, a variety of data sources continuously generate largeamounts of data. In the past, continuous data stream processing system providedlimited support for data persistence due to the limitations of demand and technology.However, with the evolving business and scenes, large-scale real-time streamingdata need to be stored persistently, while many streaming data have temporalcharacteristic and the cost of reading and writing large-scale data is very high, sothese data should be preserved in time-series order initially.This thesis first analyzes the characteristics and storage requirements of thetime-series data stream. Then the thesis designs a persistence algorithm oriented todynamic time-series data and queuing model, and thus implements a distributedcache system oriented to time-series data stream. The system provides scalablepersistent storage solutions for time-series data stream, not only to ensure a rapidresponse to data storage requests, but also to ensure that the data can be persistentstorage in time orderly. The main contributions of this thesis can be described asfollows:Firstly, according to the characteristics and storage requirements of thetime-series data stream, we proposed two persistence algorithms oriented totime-series data. Then based on the queuing theory to describe the system and toestablish the queuing model, we solved the system design optimization problems.The algorithm and model can guarantee the dynamic time-series streaming datawritten to disk in time order and thus reduce the costs of historical data sorting.Secondly, with the combination time-series data persistence algorithms and models and also inspired by the thoughts of current mainstream distributed systemarchitecture, a distributed cache system oriented to time-series data streams isdesigned and implemented. The system can support a large number of dynamictime-series streaming data sorting and caching, and has a highly efficient persistentstorage performance. The system uses in-memory database and batch persistence togreatly improve the concurrent storage requests and real-time responses.Finally, the thesis describes the design and implementation of the distributedcaching system, and tests the system with user behavior log data to verify thevalidity and feasibility of the system architecture and the sorting and persistencemodels.
Keywords/Search Tags:Time-Series Data Stream, Distributed Cache, Persistence, QueuingTheory, Redis
PDF Full Text Request
Related items