Font Size: a A A

Design And Implementation Of Real-Time Stream Processing System Based On Massivenetwork Log Data

Posted on:2018-06-23Degree:MasterType:Thesis
Country:ChinaCandidate:F LiuFull Text:PDF
GTID:2348330518996279Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet, network resource continuously enriched, websites and applications are running a lot of time to produce vast amounts of log data, behind these data hiding great value.The analysis of the network log data can be real-time understanding of the server's operation, in-depth real-time insight into the user's behavioral characteristics and grasp the current hot spot issues in the network.In this paper, the mainstream of today's Storm based real-time stream processing system is carried out in-depth research and analysis, found that the processing system based on the Storm framework has the following limitations in computing task scheduling: system resource allocation is not reasonable, the system stability is not high, the utilization rate of resources and performance and other aspects to be improved, and the lack of a high development of cluster monitoring system.The paper design and implement a real-time processing system based on massive network log data of Storm real time computing framework, which is reliable and efficient by using the existing open source framework and the task scheduling framework of Storm was optimized, adding a high expansion of the monitoring module. This paper mainly includes the following work: 1.Flume, Kafka and Storm open source framework is built based on a high stability, high scalability and the system can be collected on the log data preprocessing, analysis and show the calculation results; 2.Design a processing mode of sliding window model based on the data, improve the efficiency of Storm system for mass data calculation; 3.In the Storm system to design the adaptive dynamic flow control algorithm and a custom task scheduling algorithm,considering the load balance calculation of Storm components, improves throughput and computational efficiency of the system; 4.Design monitoring module of Storm computing cluster, which can monitor the hardware and software of Storm cluster topology operation that has high scalability.The main results of this paper is designing and implementing a monitoring system of massive log data processing, real-time analysis,users can real-time analysis of their applications and real-time operation state of mining user needs, at the same time the system can monitor the operation status of cluster nodes and tasks. The user can be more secure and reliable real-time analysis of massive log data processing.
Keywords/Search Tags:storm, real-time computing, massive log data, monitor
PDF Full Text Request
Related items