Font Size: a A A

Design And Implementation Of A Platform For Realtime And Multidimentional Analytics Of Large-scale Log Data Based On Storm

Posted on:2018-03-13Degree:MasterType:Thesis
Country:ChinaCandidate:H B ZhaoFull Text:PDF
GTID:2348330563452383Subject:Computer technology
Abstract/Summary:PDF Full Text Request
At present,it is common that daily increment of log data reaches TB level in domestic internet companies,and the realtime multidimensional statistical analysis of large-scale log data is becoming more and more important for enterprise operation,management and decision-making.However,the current large-scale log data analysis and processing technology is very professional.The users not only need to be familiar with a number of large and complex distributed systems,and need to be write different programs based on different needs of data processing.Business departments and operation and maintenance departments whoes demand of data processing are most urgent are difficult to have such capacity.In view of the above problems,this paper designs and implements a realtime multidimensional statistical analysis platform for large-scale log data,which is named Flying Streaming.First of all,through the integration of multiple open source system,Flume,Kafka,Storm and HBase,this paper designs a basic platform architecture for large-scale log data realtime analysis,in which Flume is responsible for collection and aggregation of multiple log source data,Kafka is responsible for buffering when data leaps and recovery when data loses during transmission and analysis,Storm performs realtime distributed computing,HBase storage analysis results,provide realtime reading and writing of results and wider use.Then,this paper designs the mechanism of multidimensional analysis.The mechanism is divided into four stages: access of data source,extraction of multidimensional data,multidimensional aggregation calculation and persistence of multidimensional aggregation result.Data Source access is responsible for pulling cached data distributedly from Kafka system,extraction of multidimensional data is responsible for distributedly extracting dimension data and measurement data from log data based on configuration of users' tasks,multidimensional aggregation calculation is responsible for calculating measurement result based on configuration of tasks,and persistence of multidimensional aggregation result is responsible for storing analysis result to distributed database.Then,this paper designs the mechanism of thermal response,which is divided into the task configuration layer,the configuration information persistence layer and the distributed computing layer.The task configuration layer is implemented by the front-end Web UI,The configuration persistence layer is implemented by MySQL,and the distributed computing layer is implemented in Storm topology.Finally,the above two mechanisms were implemented by writing the Storm Topology program.Finally formed a unified large-scale log data multidimensional statistical analysis platform for Internet enterprise.When using Flying Streaming,users do not need large data programming,only need to submit configuration of a task in the Web UI to complete hot submitting of new task,hot updating or hot deleting of old task,and the analysis result is available in diagram form from Web UI.In order to verify Flying Streaming,this paper tests the multidimensional statistical analysis function of Flying Streaming in the actual production practice and tests the performance of throughput and latency through the simulation of the large-scale log data flow.Practice and experiment show that the application effect of Flying Streaming in Internet enterprise is good,Flying Streaming can meet the needs of most business departments and operation and maintenance departments.
Keywords/Search Tags:Storm, Large-scale Log Data, Realtime Analytics, Multidimensional Analytics, General Platform
PDF Full Text Request
Related items