Research On Key Technologies Of High Availability For Storm Cluster

Posted on:2017-08-25

Degree:Master

Type:Thesis

Country:China

Candidate:Q X Ma

Full Text:PDF

GTID:2348330491952347

Subject:Computer system architecture

Abstract/Summary:

PDF Full Text Request

In recent years, many applications under the environment of big data present some features which are multi-source data, data aggregation, online real-time processing. Hadoop which carrying massive data processing tasks is not skilled in real-time processing, there are many restrictions on the response time and timeliness. So real-time computation has developed rapidly and become one of the hottest research spots at present.As a big data processing tool in the field of real-time data processing, Storm can handle huge amount of data stream easily and reliably. Now many enterprises are researching and building real-time computation system based on Storm. But the early Storm has only one Nimbus node in the cluster, a single point of failure exists in this distribution structure, which lowers availability, and makes it very difficult to provide continuous service. And now in the actual production environment, the cluster version and business types are complex and diverse, an efficient monitoring system is still needed to detect the exceptional cases instantaneously. If not, it may bring unpredictable losses to enterprises and consumers. Thus, it has been critical for enterprises to achieve high availability of Storm cluser, eliminate these losses.In this thesis, based on in-depth study of Storm, we analyze its working principle and job submission process, as well as the mechanism of each node. For the single point of bottleneck, the distributed coordinate system Zookeeper is introduced in detail, we also do a thorough research on the system architecture, data model, ZAB protocols and typical application scenarios. Based on the discussion and introduction above, we mainly look into a high availability solution, which achieved Storm cluster failover automatically, and make a detailed elaboration on leadership election and failover, topology code shared storage, communication between client and Nimbus etc. Combined with the current multi-cluster environment of enterprises, we design one high availability solution based on Zookeeper for Storm multi-cluster, when one cluster runs into a complete failure, topologies can be quickly migrated to another cluster with redundancy mechanism between clusters. And there is a more comprehensive monitoring module in order to monitor the health of the cluster and the operational status of the topology, which consists of scheduled tasks and real-time alert service.In the last part of this thesis, test the solution to verify the feasibility and effectiveness of this high availability solution. Experimental results show that our solution works well. Early detection of problems and failure recovery can improve availability of the Storm cluster.

Keywords/Search Tags:

Storm, Zookeeper, High Availability, Single Point of Failure, Cluster Monitoring

PDF Full Text Request

Related items

1	The Design And Implementation Of High-availability Cluster Management And Monitoring System Based On Linux
2	The Research Of High Availability Of Transaction Systems
3	Research And Implementation Of HDFS High Availability Based On Cluster
4	Research On Technology Of Failure Detection And Replica Placement For High Availability Cluster System
5	Research And Implementation Of High Availability NAS Cluster
6	Design And Implementation Of Metadata Server For Mass Stream Data Storage System
7	Research And Implementation Of Integrated Maintenance Monitoring And Management System
8	The Study Of High Availability Of Network Servers Cluster System And Management Software Realization
9	The Research And Implement Of Virtualization Technology Based On Cluster Of Server
10	Optimization Study On The Available Performance For HDFS Cloud Storage System