Font Size: a A A

Application Research Of Real-time Data Analysis Based On Spark Computing

Posted on:2021-04-20Degree:MasterType:Thesis
Country:ChinaCandidate:J ChenFull Text:PDF
GTID:2428330620976058Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of the network,all kinds of data show explosive growth.The continuous accumulation of massive data puts forward higher requirements for data storage and calculation,and various distributed computing frameworks and distributed storage models emerge in endlessly.Distributed file storage system HDFS has been widely used for its good practicability.At the same time,spark computing framework has attracted wide attention of academia and Society for its high availability.It is an urgent problem to reasonably use these two computing frameworks to process log data and display log analysis results with visual tools.In order to achieve this goal,it is necessary to develop data analysis solutions in corresponding business scenarios.In this paper,we design and develop a web log data analysis system based on Hadoop platform,in which each component of Hadoop ecosystem provides the ability of offline log data analysis and calculation.The application system uses spark flow computing framework to design real-time log computing application,and MapReduce computing framework to design offline computing application.The front-end display uses the current mainstream Java EE platform for design and development.Various back-end development frameworks,such as spring MVC,provide better maintainability and scalability.At the same time,it provides the web application function based on HTML5 page development,so that users can get multidimensional statistical information of analysis results.In the aspect of data display,we use interactive charts such as echart and highcharts to provide flexible customization and visualization for the analysis results.The work of this paper is divided into two parts: real-time data analysis based on spark computing and offline data analysis based on Hadoop platform.This paper first introduces the relevant knowledge and key technologies,then introduces the platform architecture design,application requirements,specific module implementation and visual design of real-time data processing and offline data processing,and finally constructs and tests the test environment.
Keywords/Search Tags:Hadoop, Spark, HDFS, log data
PDF Full Text Request
Related items