Font Size: a A A

Research And Implementation Of Log Analysis System Based On ELK And Spark

Posted on:2019-11-17Degree:MasterType:Thesis
Country:ChinaCandidate:H YuanFull Text:PDF
GTID:2428330548963642Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In the era of big data,with the continuous expansion of storage and computing clusters,it becomes more and more urgent for enterprises to use these massive data to analyze and extract the information that is valuable to the enterprise.To achieve this goal requires a log processing solution that meets the company's own application scenario.The distributed computing platform for mass data processing becomes an ideal platform for these log analysis.Google File System,MapReduce,and BigTable,three big data collection and processing technologies introduced by Google in 2003-2006,have started the rapid development of big data,followed by the open-source big data framework Hadoop generated from three Google papers.The birth of the mass data at the time has achieved excellent performance.After years of development.Currently,log processing technologies such as Spark and ELK have been widely used in the market.Spark is a fast and large data processing framework based on in-memory computing and has good fault tolerance and scalability.ELK is an open source log processing platform solution.It can quickly process big data,can conduct distributed collection and distributed storage management of logs,and provide full-text search and statistical analysis.The research goal of this paper is based on the massive logs generated by each business of the enterprise engineering system.Contains data for project build logs,compilation logs,central repository access logs,static checks,etc.The log processing technology with ELK and Spark as the core is introduced from the aspects of log collection,storage,analysis,and display.Achieved the collection and preprocessing of massive distributed logs,and these logs were distributed and stored into a storage system.Offline log batch processing and real-time log analysis were performed according to requirements.The resulting data is visually displayed.Users do not need to care about how the platform implements specific acquisition,preprocessing,storage,and calculation processes.In order to efficiently perform log analysis tasks,this paper further analyzes the performance of the Spark processing data in the log analysis phase,analyzes in detail the factors that affect the Spark performance,and optimizes the performance based on commonly used optimization methods.This business case proposes a random key suffix to optimize the performance of the Spark cluster.This will increase the computational efficiency of the cluster.Enables offline and real-time analysis to focus on one system for fast operation.
Keywords/Search Tags:Log Analysis, Distributed Computing, Spark, ELK
PDF Full Text Request
Related items