The Design And Implementation Of Collection And Analysis System For Spark Performance Data

Posted on:2016-08-22

Degree:Master

Type:Thesis

Country:China

Candidate:W Q Wu

Full Text:PDF

GTID:2308330479491517

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

With the rapid development of computer information technology, distrib uted computation has gained magnificent progress in the field. As a popular distributed computing framework at present, Spark has been widely used in enterprise business by developers. Developers hope to understand the performance data of underlying clusters during the runtime of Spark operation programs in order to master the operation of the whole Spark cluster and to figure out the bottleneck of system. Thus performance tuning is possible and higher computational efficiency can be obtained with less operation time. Consequently, the development of a collection and analysis system for Spark performance data is particularly important for developers to carry out performance optimization.A detailed introduction on the designing and realization process of the collection and analysis system for Spark performance data was given in this paper. User demands were analyzed thoroughly at first. Functional modules of the system was then divided into four parts, i.e. configuration overview, data collection, data processing and data analysis, according to requirements. The designing and implementation process of all functional modules, the test schemes and conclusions are all given in this paper.The Spark performance data collection system designed here is based on Akka framework and the idea of distributed master-slave nodes was taken into consideration. Thus the function of distributed performance data collection and process was realized. The data collection module accomplishes the gathering and storage of Spark performance data through the DSTAT monitoring tools operating on slave nodes, which ensures the correctness and real-time feature of data collection results. In addition, the module is highly extendible and facilitates the development of new service system. The data processing module analyzes the driver log and performance data of Spark, and displays the analysis results via differe nt types of statistical graphs. The data analysis algorithm can partition different cluster status, analyze free status and proportion of unused resources, and meanwhile transform difference of performance data from different servers into Euclidean distance to implement load balance analysis, which can give analysis reports automatically and is valuable for developers to carry out performance tuning.The main result of this paper is the collection and analysis system software for Spark performance data. Currently, the system is under good running state, and performance data has been collected successfully, which is satisfying. This can h elp developers to understand the underlying operation status of Spark cluster effectively, and offer reference for performance tuning.

Keywords/Search Tags:

Distributed, Performance Data, Collection and Process, Data Analysis

PDF Full Text Request

Related items

1	High-performance Acquisition And Intelligent Analysis Of Large-scale Network Data
2	Research And Implementation Of The Distributed Traffic Performance Collection And Mass Data Analysis System
3	Large Capacity Underground Data Collection, Storage And Analysis System
4	Design And Implementation Of Data Collection And Analysis Platform Based On Distributed Storage
5	Research Of Data Analysis Process Based On MDA
6	Research And Implementation Of Log Collection And Analysis System Based On Big Data
7	Data Collection And Analysis System Research Based On S1Interface In LTE Network
8	A Realization Method About Data Collection And Analysis In GSM Network By Applying OLE And COM
9	Research And Design On Distributed Agent Framework For Log And Data Collection
10	Design And Implementation Of Embedded Software Performance Testing Tools