Font Size: a A A

The Research Of Log Processing Platform Based On Apache Kafka

Posted on:2018-09-23Degree:MasterType:Thesis
Country:ChinaCandidate:X H FeiFull Text:PDF
GTID:2348330515496657Subject:Engineering
Abstract/Summary:PDF Full Text Request
Nowadays,big data is inseparable from our work and life.From a commercial point of view,big data technology has been very widely used in statistical analysis of user behavior and mining potential business value.It is necessary to transfer the data to the platform of statistical analysis.In user behavior analysis,data source is mainly referred as mobile terminal log and web log.Today,there are a variety of ways to collect logs.How to get collected log access to statistical analysis platform and how to ensure the stability and efficiency of data access system is very important,because these will directly affect the quality of subsequent data analysis.and data mining task.In this paper,we will introduce a new architecture of processing log based on the data analysis platform of Net Ease.Nowadays,the analysis of data often includes two aspects,off-line data analysis and online real-time data analysis.Different application scenarios have different analysis strategies.No matter what type of analysis,we need to access the data source to the statistical analysis platform.This paper realizes a FHDFSConnector based on Connector interface provided by Kafka0.10.0.0,which is capable of storing log messages to HDFS distributed file system in real-time manner,and facilitates subsequent off-line statistics,analysis,mining and prediction.for subsequent off-line statistics,analysis,mining,provide data support for prediction.At the same time,this paper presents a new type of large data processing architecture,which is capable of both off-line and real-time computing,differing from traditional one.Moreover,it is more scalable and has higher throughput.In addition,this kind of processing architecture simplifies the number of components in the system and demonstrates more maintainable.The main work of this pape firstly is design of every module of the entire log processing platform.Secondly,the design and implementation of FHDFSConnector will be introduced.Finally,a new log processing architecture is designed,and the experiment and comparison are conducted.
Keywords/Search Tags:Big data, Kafka, Spark, HDFS, log process
PDF Full Text Request
Related items