Font Size: a A A

Research And Design Of A Distributed Real-time Log Analysis System

Posted on:2018-05-15Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y ZhouFull Text:PDF
GTID:2428330515499724Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the arrival of the era of big data,the applied value of big data has been more and more highlighted.The techniques of data processing and analysis for static data sets has been developed to a relatively mature level,and it is not hard to process the TB or even PB level data.Although it can process the data offline in batches,it is not able to meet the high demand for real-time data processing.And also,more and more online data required for real-time processing implies a higher request for data processing techniques,along with more challenges too.Firstly,based on the real-time processing techniques of massive log of the data stream,this thesis proposes and designs a distributed real-time processing system,emphatically describes the solutions in aspect of the overall structure of the system.This system mainly includes acquisition module,publish subscribe and storage module and real-time processing module.Acquisition module resolves many problems existing in acquisition and aggregation of the log data,such as heterogeneous data sources,uneven generation velocity,unreliable transmission,etc.Publish-subscribe and storage module efficiently resolves the problem of different speeds between acquisition module and real-time processing module,and makes some expansion of the Sink.Real-time processing module provides the real-time processing platform of high availability and high performance,and with the addition of hot swap module,it is able to support the real-time processing program to do the incremental computation when code updates;Secondly,this thesis proposes a kind of streaming data KNN Classification Algorithm named S-KNN Algorithm based on Storm,which is the parallelization transformation of the KNN Algorithm and be applied in the real-time classification of online data stream for the first time.By partitioning the whole sample set into multiple piece sets first,it then computes local K-nearest neighbors of those flowing-in sample vectors on each piece set,finally,the local K-nearest neighbors are reduced to the global K-nearest neighbors by the way of parallel reduction.Finally,the S-KNN Algorithm is demonstrated on a distributed real-time log processing system with a superior result,which not only proves the high-efficient and horizontal scalability of this system,but also verifies that the S-KNN Algorithm is suitable for real-time classification of massive online log data stream.
Keywords/Search Tags:Big data, Stream data, Realtime computing, Storm, KNN
PDF Full Text Request
Related items