Font Size: a A A

The Profling And Memory Analysis On Typical In-memory Computing Big Data System

Posted on:2017-12-18Degree:MasterType:Thesis
Country:ChinaCandidate:W S SuFull Text:PDF
GTID:2348330515985760Subject:Engineering
Abstract/Summary:PDF Full Text Request
The open source distributed platform Spark has attracted more and more attention in the big data computing area in recent years.Based on its in-memory computing designment,it highly increases the rate of iterative computation,but incurs the out of memory exceptions occur more frequently meanwhile,both of which bring the Spark performance to the front.The need to make unified performance analysis of Spark towards different big data applications becomes increasingly urgent,for it can help better optimize the system performance and ensure the quality of service.This thesis we will focus on the design and implementation of Spark Profiling Tool(SPT),which is designed for Spark unified performance analysis to measure and evaluate the memory consumption in Spark.It collects different kind of information of the system software stack and analyzes them together.Specially,considering the out of memory exceptions occur frequently in Spark,with the help of SPT,we make more probes and perform experiments to give optimize suggestions.To sum up,the major works of this thesis are listed as follows:(1)Determining the overall architecture and main functions of SPT through requirement ananlysis.It mainly includes three modules:data gathering module,data preprocessing module and data analysis module.(2)In data gathering module,it collects a wide range of performance information,including programming layer,operating system layer and architecture layer.Allowing for different data resources,we design different data collecting methods,among which we design a kind of easy-open probe to trace the system dynamically.(3)In data preprocessing module,considering the gathered data has different sources and structures,we des,ign a unified middle file format to get rid of the redundant information,and use Spark programming model to make the recombination between different files more efficiently.(4)In data analysis module,to ensure the efficiency of data analysis,we continuously use Spark to perform high-speed distributed computation,especially using Spark SQL which provides UDF and UDAF to realize the customized data analysis functions.(5)Based on the above efforts,we implement the SPT prototype system.We conduct functional tests towards every module and use SPT to analyze and optimize the OOM exceptions in Spark.All of them have proved that the the main solutions and key technologies we choose are effective.
Keywords/Search Tags:Spark, data collecting and analyzing, OOM, performance tuning
PDF Full Text Request
Related items