Font Size: a A A

Research On Scalable Elastic Content-based Publish/Subscribe Technology

Posted on:2015-06-09Degree:DoctorType:Dissertation
Country:ChinaCandidate:X K MaFull Text:PDF
GTID:1108330509960988Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Internet has become an omnipresent platform for computing and storage. One of its urgent problems is how to disseminate data from source nodes to large-scale clients with various interests in a real-time and accurate manner. The content-based publish/subscribe(pub/sub) model is becoming the mainstream model of data dissemination. This is because its loosely coupled communication model is fit for large-scale distributed systems, and it provides powerful expressiveness for users. In recent years, content-based pub/sub model has been used in the emergency applications of earthquake monitoring, disaster weather warning, stock updating, social networks, streaming media, online games, etc. These emergency applications have two characteristics. One is that the live content tend to be generated with large-scale items, high arrival rates and complex structures. It requires high scalability of content-based pub/sub systems(pub/subs), such that the live content items timely reach their interested users. The other is that the arrival rate of the live content may change dramatically in a short time. It requires that content-based pub/subs elastically adjust their capacities based on the system workloads to achieve low disseminating latency and high performance price ratio. However, existing content-based pub/subs are inadequate to achieve both scalability and elasticity. Specifically, there are three technical challenges. Firstly, event matching is able to efficiently filter out uninterested subscribers to reduce the disseminating latency in content-based pub/subs. Under complex subscription patterns, the matching throughput of existing event matching methods significantly reduces with the growth of subscriptions. Under dynamic networks and high-dimensional data, existing event matching methods have low matching throughput, and events lose with a high probability. Secondly, efficient event routing technique is a key factor of real-time data dissemination. The upload bandwidth of existing content-based pub/subs becomes the bottleneck in the face of disseminating bulk content. Thirdly, efficient total ordering technique in content-based pub/subs can quickly detect conflicted events to reduce the delivery latency. Existing total ordering methods bring high delivery latency in the face of large-scale arrival events. To achieve scalable elastic content-based pub/subs, this dissertation deeply studies the techniques for event matching, event routing and total ordering, respectively.Under complex subscription patterns, the event matching throughput is greatly affected by the scale of subscriptions, and it dramatically reduces with the growing number of subscriptions. To this end, this dissertation proposes a general scalable and elastic event matching approach in the face of complex subscription patterns, called GSEM, to adapt to the rapid growth of subscriptions and ensure high matching throughput. GSEM adopts a two-hop lookup overlay as the general framework in the cloud computing environment. Due to the “dispatch-aggregate” scheme of this framework, subscriptions with diverse complex patterns can be transformed into a general pattern to simplify the process of event matching. Through a hierarchical content space partitioning technique, GSEM dispatches large-scale skewed subscriptions into multiple balanced clusters, each of which is mapped to a uniformly selected matcher to achieve parallel matching service. To adapt to the changing of workloads, GSEM uses a waiting-latency-aware matcher provisioning algorithm to elastically adjust the matching throughput, which ensures high performance price ratio. A prototype deployment on the Cloud Stack platform shows that GSEM has a linear increasing matching capacity as the number of servers and the partitioning granularity increase. It is able to elastically adjust the scale of servers and tolerate a large number of server failures with low latency and traffic overhead. Compared with existing cloud based pub/sub systems, the matchers in GSEM achieve better balanced workloads and higher matching throughput under various parameter settings.Under dynamic network environments, high-dimensional data dissemination faces the problems of low matching throughput and high event loss rate. To this end, this dissertation proposes a scalable and reliable event matching approach in the face of high-dimensional live content, called SREM. SREM connect servers in the cloud computing environment through a distributed overlay Skip Cloud. Skip Cloud organizes servers by multi-level random graphs, where the prefix relations between adjacent levels ensure efficient routing, and exchanging neighbor lists among servers in the same level provides high reliable links. Through a hybrid content space partitioning algorithm, SREM maps large-scale high-dimensional skewed subscriptions into multiple subspaces to consider both matching reliability and memory overhead. In each subspace, SREM uses the hierarchical content space partitioning technique to ensure high matching throughput. Under various parameter settings, a prototype deployment on the Cloud Stack platform demonstrates that the traffic overhead of routing events in Skip Cloud is at least smaller than in Chord overlay, the matching rate in SREM is at least 3.7 times and at most 40.4 times larger than that of Blue Dove. Besides, SREM enables the event loss rate to drop back to 0 in tens of seconds even if a large number of servers fail simultaneously.The upload bandwidth of existing event routing approaches becomes the bottleneck in the face of disseminating bulk content. To this end, this dissertation proposes a bulk content oriented scalable and elastic event routing approach, called SEER. SEER adapts a hybrid two-layer overlay in the cloud computing environment to execute both event matching service and event routing service in a parallel manner. At the top layer, SEER adopts GSEM to achieve scalable and elastic event matching, and multiple helpers are organized into tree based topology to ensure real-time routing of bulk content. At the bottom layer, subscriptions are divided into multiple clusters, each of which is managed by a helper. Through a helper based content distribution algorithm, helpers act as both providers and coordinators to fully exploit the upload capacity of the system. As providers, helpers actively forward chunks to interested subscribers. As coordinators, helpers guide clients with similar interests to exchange their received chunks according to the download information from clients. To adapt to the churn workloads, SEER uses a volume-aware helper provisioning algorithm to ensure low disseminating latency and high performance price ratio. A prototype deployment on the Cloud Stack platform demonstrates that SEER can linearly reduce the download completion time with the growing number of servers, adaptively adjust the upload capacity in tens of seconds according to the change of the workloads, and ensure reliable data dissemination even if a large number of nodes frequently churn or instantaneously fail. Compared with the state-of-the-art approaches, the two-layer framework of SEER demonstrates better scalability, helpers have better workload balance, and SEER shows lower disseminating latency under various parameter settings.To address the problem of low total ordering efficiency of existing large-scale total ordering approaches, this dissertation proposes a scalable and elastic total ordering approach in the face of events with high arrival rate, called SETO. SETO adopts a two-layer framework to organize servers in the cloud computing environment to execute event matching and total ordering in a parallel manner. At the top layer, SETO adopts GSEM to achieve high matching throughput. At the bottom layer, multiple sequencers are organized into a tree based topology to ensure real-time event routing. In each sequencer, SETO uses a preceding graph building algorithm to achieve low delivery latency. In the preceding graph building algorithm, a sequencer build and resolve the preceding relations between conflicted events with low computing overhead, such that non-conflicting events are able to deliver to their corresponding interested subscribers simultaneously. Through a delivery-latency-aware sequencer provisioning algorithm, SETO is able to elastically adjust the scale of sequencers to adapt to the churn workloads. A prototype deployment on the Cloud Stack platform demonstrates that SETO can reach high concurrency degrees during delivering events under different workloads, linearly reduce the delivery latency with the growth of sequencers, adaptively adjust the scale of servers less than 10 seconds. Compared with the approaches with a central sequencer and tree-based sequencers, SETO shows lower delivery latency under diverse parameter settings.
Keywords/Search Tags:Publish/Subscribe, Event Matching, Event Routing, Total Ordering, Scalability, Elasticity
PDF Full Text Request
Related items