Font Size: a A A

Historical Data Storage Service In Publish/Subscribe System

Posted on:2016-12-10Degree:MasterType:Thesis
Country:ChinaCandidate:C ZhangFull Text:PDF
GTID:2308330503976380Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The publish/subscribe (P/S) communication paradigm adapts to the large-scale distributed computing environment well and gets a lot of attention in recent years, due to its time and space loosely-coupled communication properties. The space loose-coupling refers to that the publishers publish data no matter who subscribes to. Subscribers only receive the interested data without knowing where they come from. Moreover, publishers and subscribers are not required to be online in an interactive process at the same time. Publishers can publish data when subscribers do not show up and subscribers can still get the data while publisher being offline. This achieves time loose-coupling of both communication sides.Publish/subscribe system is a middleware system supporting the various participants in a distributed system to interact using publish/subscribe paradigm. In order to support time loose-coupling, publish/subscribe systems should be equipped with the capability of historical data storage so that subscribers can access all published interested data during the system’s life cycle. A lot of literature addressing the historical data storage issue in publish/subscribe systems mainly focuses on mobility supporting as well as historical data caching. Static storage strategy is commonly applied to store all the history data. The static strategy is poor at load balancing while some storage nodes are easy to be hot.Regarding the problem of static storage strategy, this dissertation designs a novel solution for the storage of historical data. The topic’s metadata and data are separately stored in the system. The metadata servers are organized with a distributed architecture and the client locates a metadata server through the Chord ring algorithm. Considering the capacity limits of single proxy, a segmentation strategy for the storage of the same topic is applied, that is, the topic’s data is divided into blocks and distributed to multiple nodes. As the rates of data publishing may different, this dissertation presents a dynamic data partitioning algorithm to maintain appropriate block size. According to the characteristics of data communication of publish/subscribe systems, the data storage nodes access topic data by subscribing the topic. Therefore, a storage nodes selection algorithm based on the storing fitness value is presented which can alleviate network overhead. For the purpose of improving data localization rate within the data block, a "Key Frame" is generated in the process of data block storage. This "Key Frame" is saved in the index of file header. Thus, data is able to be located according to the index. Moreover, replication strategy is adopted to cope with node failure. The replicas of metadata based on topic are managed by a chain and the location of the data replications are assigned by the topic’s metadata proxy. In order to reduce the effects of temporary node failure, the number of real replicas is larger that of the specified. The recovery should be launched when the number of replicas is fewer than that of the specified.Based on the theoretical analysis, a historical data storage prototype system based on a series of storage strategies is designed and implemented. Comprehensive simulations and experiments are conducted to compare with static storage strategy. The results show that our historical data storage approach outperforms in both access efficiency and better load balancing.
Keywords/Search Tags:Publish/Subscribe system, Historical data, Distributed metadata segmentation, Storage nodes selection, Replication strategy
PDF Full Text Request
Related items