Font Size: a A A

Research And Implementation Of Big Data Real-time Processing System Based On Storm

Posted on:2016-02-27Degree:MasterType:Thesis
Country:ChinaCandidate:S H LongFull Text:PDF
GTID:2308330476953501Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Real-time processing technique such as Storm and Spark is currently a hotspot in the field of big data processing. This paper is based on the construction of the provincial transport and logistics platform project of our laboratory, and the platform includes big data analytics services based on batch processing and real-time data processing services based on streaming computing systems Storm. However, there are still some problems in pratical application on the stream processing system Storm. For example, the round-rubin strategy adopted by the default scheduler will lead to the unbalanced workload between worker nodes. Meanwhile, the single schedule strategy can’t meet the need of diverse business requirements. What’s more, the single point of failure of nimbus will lead to the failure of topology submission and tasks allocation.To solve these problems, based on the research of stream processing system Storm and related technology, by analyzing the requirements of real-time data processing in the transport and logistics cloud computing platform, this paper designs and implements a big data real-time analysis system based on Storm. This system provides real-time data processing service for Saa S application of logistics enterprises, solving the problems of uneven distribution of workload between worker nodes by Storm’s default Topology scheduler and Nimbus’ s single point of failure. Test and application show that the system is feasible and effective.Compared with similar systems, the paper has the following characteristics:(1) To improve the performance of the system, solve the problems of uneven workload between multiple worker nodes and single schedule strategy result from the default scheduler, this paper proposes RBS schedule algorithm based on node resource monitoring and SNS algorithm which schedules workers to single node. Based on these two algorithms, this paper also designs and implements the corresponding Topology scheduler. Experiments show that, RBS scheduler can allocates tasks to nodes with lower resource utilization according to resource usage of work nodes; SNS scheduler can schedule all workers of Topology which only perform simple arithmetic operation and have not too much intermediate state to a single physical node.(2) To improve the availability of the system, this paper proposes a solution to solve the problem of single point of failure for Nimbus. Multiple Nimbus nodes can be coordinated by Zookeeper for Leader election and keeping the data synchronized between the master nimbus node and slave nimbus nodes. Experiments show that when the master nimbus goes down, the other slave nimbus nodes can still serve the entire cluster which contains three nimbus nodes, so the Topology can keep running all the time.(3) Combined with the above work, design and implement a real-time big data processing system based on Storm, which provides Saas application of logistics enterprises with real-time big data analysis services. The system includes development environment and runtime environment for stream computing applications.Runtime environment for stream processing applications includes:Runtime environment for stream processing applications tasks includes input stream component, Topology scheduler based on Ganglia, nimbus cluster coordinator based on Zookeeper and persistent output component.Runtime environment for data input/output services of stream processing applications includes data fetcher and pre-processor, Kafka middleware and No SQL database.Application development environment includes integrated development tool, testing tool, and deployment tool:Integrated development tools are based on Eclipse. The tool can provide data fetcher and pre-processor API, the input stream components API and the persistence output components API for application developers.Testing tools based on the encapsulation of stand-alone storm can be used to provide simulated runtime environment for stream processing applications.
Keywords/Search Tags:Storm, Real-time processing, Stream processing, Topology scheduling, Transportation and Logistics
PDF Full Text Request
Related items