Design And Implementation Of Big Data Processing Visualization Tool Based On Spark

Posted on:2018-07-01

Degree:Master

Type:Thesis

Country:China

Candidate:Z W Tan

Full Text:PDF

GTID:2348330518996295

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of internet information technology, the internet has generated a lot of information and data. How to clean these data and dig out valuable information quickly and effectively has become the urgent needs of the real world. Under such circumstances, a variety of big data processing platform comes into being. The emergence of Hadoop makes people pay attention to MapReduce, a new calculation mode.Spark, which introduces RDD data model, can deal with big data better on advantage of its memory. Besides, its iterative computing is better than Hadoop.However, when users are using Spark, users are required to have relevant professional quality and learn expertise related of Spark. At the same time, some of the enterprise use different Spark cluster hardware resources, cluster heterogeneity is obvious, but the default two-tasks scheduling algorithm in the heterogeneous Spark cluster of Spark does not take the ability of the node differences into account. This paper proposes and implements a big data processing visualization tool based on Spark, adopts B/S design pattern and proposes Spark's heterogeneous task scheduling algorithm.The research work of this paper is divided into the following two aspects:Firstly, this paper proposes a visualization tool based on Spark large data processing, which can process and design the data processing process based on Web. The user can create the logical process of data processing by dragging and dropping the picture, including the definition of data source, data processing of the calculation operator or Spark SQL statement, the storage of data results, to complete the data processing process design. At the same time, for the web on the generated process files, design and implementation of the Spark-based computing engine Jar package, used to resolve the file.Secondly, Spark's hungarian dichotomy task scheduling algorithm is proposed. Considering the heterogeneity of the Spark cluster, the Spark task and the node are abstracted into a bipartite graph, and the new Spark task scheduling algorithm is implemented according to the delay of the task and the node using the Hungarian algorithm.At last, this paper experiments with Spark's large data processing visualization tool, and tests the system from the aspects of function,performance and algorithm validity. The results show that the proposed large data processing visualization tool based on Spark can meet the basic needs of users.

Keywords/Search Tags:

Big data processing, Spark, Graphical operation, Hungarian algorithm

PDF Full Text Request

Related items

1	Design And Implementation Of Telecom 4G Big Data Platform For Network Optimization Based On Spark
2	Design And Implementation Of Data Processing And Analysis System Based On Spark
3	Real-time Mass Data Processing Analysis And Optimization Based On Spark
4	Density based visualization of big data with Graphical Processing Units
5	Implementation And Optimization For Join Operation In Spark
6	The Research Of Big Data Manipulating Technology Based On Spark
7	Design And Implementation Of Real-time Data Processing Platform Based On Event Driven Architecture
8	App Lication Of Spark-based Real-time Efficient Processing Algorithm In Internet User Behavior Analysis Platform
9	Research And Implementation Of Query Processing And Interpolation Algorithm For Ocean Argo Data Based On Spark
10	Research On Data Stream Clustering Algorithm Based On Spark Streaming