The Profling And Memory Analysis On Typical In-memory Computing Big Data System

Posted on:2017-12-18

Degree:Master

Type:Thesis

Country:China

Candidate:W S Su

Full Text:PDF

GTID:2348330515985760

Subject:Engineering

Abstract/Summary:

PDF Full Text Request

The open source distributed platform Spark has attracted more and more attention in the big data computing area in recent years.Based on its in-memory computing designment,it highly increases the rate of iterative computation,but incurs the out of memory exceptions occur more frequently meanwhile,both of which bring the Spark performance to the front.The need to make unified performance analysis of Spark towards different big data applications becomes increasingly urgent,for it can help better optimize the system performance and ensure the quality of service.This thesis we will focus on the design and implementation of Spark Profiling Tool(SPT),which is designed for Spark unified performance analysis to measure and evaluate the memory consumption in Spark.It collects different kind of information of the system software stack and analyzes them together.Specially,considering the out of memory exceptions occur frequently in Spark,with the help of SPT,we make more probes and perform experiments to give optimize suggestions.To sum up,the major works of this thesis are listed as follows:(1)Determining the overall architecture and main functions of SPT through requirement ananlysis.It mainly includes three modules:data gathering module,data preprocessing module and data analysis module.(2)In data gathering module,it collects a wide range of performance information,including programming layer,operating system layer and architecture layer.Allowing for different data resources,we design different data collecting methods,among which we design a kind of easy-open probe to trace the system dynamically.(3)In data preprocessing module,considering the gathered data has different sources and structures,we des,ign a unified middle file format to get rid of the redundant information,and use Spark programming model to make the recombination between different files more efficiently.(4)In data analysis module,to ensure the efficiency of data analysis,we continuously use Spark to perform high-speed distributed computation,especially using Spark SQL which provides UDF and UDAF to realize the customized data analysis functions.(5)Based on the above efforts,we implement the SPT prototype system.We conduct functional tests towards every module and use SPT to analyze and optimize the OOM exceptions in Spark.All of them have proved that the the main solutions and key technologies we choose are effective.

Keywords/Search Tags:

Spark, data collecting and analyzing, OOM, performance tuning

PDF Full Text Request

Related items

1	Research And Design Of Data Collecting And Analyzing Subsystem Of Lsf
2	Research And Design Of Data Collecting And Analyzing Subsystem Of LSF
3	Research Of Cellular Mobile Network Performance Analyzing Platform And Performance Detecting
4	Design And Implementation Of Telecom 4G Big Data Platform For Network Optimization Based On Spark
5	Study Of Performance Analyzing And Tuning Methods Of Database Application System
6	The Research Of Big Data Manipulating Technology Based On Spark
7	Research And Development On Collecting And Processing System Of Chemical Data Based On Chemometrics
8	High-performance Data-collection Instrument Design And Reality
9	Design And Implementation Of Log Analysis Tools Based On Spark
10	A Big Data Analyzing Facility Based On Spark Supporting Standard SQL Grammar