Research And Implementation Of Log Analysis System Based On ELK And Spark

Posted on:2019-11-17

Degree:Master

Type:Thesis

Country:China

Candidate:H Yuan

Full Text:PDF

GTID:2428330548963642

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

In the era of big data,with the continuous expansion of storage and computing clusters,it becomes more and more urgent for enterprises to use these massive data to analyze and extract the information that is valuable to the enterprise.To achieve this goal requires a log processing solution that meets the company's own application scenario.The distributed computing platform for mass data processing becomes an ideal platform for these log analysis.Google File System,MapReduce,and BigTable,three big data collection and processing technologies introduced by Google in 2003-2006,have started the rapid development of big data,followed by the open-source big data framework Hadoop generated from three Google papers.The birth of the mass data at the time has achieved excellent performance.After years of development.Currently,log processing technologies such as Spark and ELK have been widely used in the market.Spark is a fast and large data processing framework based on in-memory computing and has good fault tolerance and scalability.ELK is an open source log processing platform solution.It can quickly process big data,can conduct distributed collection and distributed storage management of logs,and provide full-text search and statistical analysis.The research goal of this paper is based on the massive logs generated by each business of the enterprise engineering system.Contains data for project build logs,compilation logs,central repository access logs,static checks,etc.The log processing technology with ELK and Spark as the core is introduced from the aspects of log collection,storage,analysis,and display.Achieved the collection and preprocessing of massive distributed logs,and these logs were distributed and stored into a storage system.Offline log batch processing and real-time log analysis were performed according to requirements.The resulting data is visually displayed.Users do not need to care about how the platform implements specific acquisition,preprocessing,storage,and calculation processes.In order to efficiently perform log analysis tasks,this paper further analyzes the performance of the Spark processing data in the log analysis phase,analyzes in detail the factors that affect the Spark performance,and optimizes the performance based on commonly used optimization methods.This business case proposes a random key suffix to optimize the performance of the Spark cluster.This will increase the computational efficiency of the cluster.Enables offline and real-time analysis to focus on one system for fast operation.

Keywords/Search Tags:

Log Analysis, Distributed Computing, Spark, ELK

PDF Full Text Request

Related items

1	Research And Implementation Of Log Analysis System Based On ELK And Spark
2	The Design And Implementation Of Log Analysis System In Cloud Computing Environment
3	Graph Reachability Distributed Computing And Application Based On Spark
4	Design And Implementation Of A Distributed Hybrid Index Structure Based On Spark
5	Research On Text Sentiment Analysis Via Spark And Machine Learning
6	Research And Realization Of Clustering Algorithm Based On Spark Platform
7	Design And Implementation Of Forum Data Analysis Platform Based On SPARK
8	A High-Performance Chinese Distributed Computing System (CH-Spark)
9	Research On Distributed Manifold Learning Algorithm Based On Spark
10	Research On Memory Data Management Technology In Spark