Font Size: a A A

Design And Implementation Of Big Data Processing Visualization Tool Based On Spark

Posted on:2018-07-01Degree:MasterType:Thesis
Country:ChinaCandidate:Z W TanFull Text:PDF
GTID:2348330518996295Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of internet information technology, the internet has generated a lot of information and data. How to clean these data and dig out valuable information quickly and effectively has become the urgent needs of the real world. Under such circumstances, a variety of big data processing platform comes into being. The emergence of Hadoop makes people pay attention to MapReduce, a new calculation mode.Spark, which introduces RDD data model, can deal with big data better on advantage of its memory. Besides, its iterative computing is better than Hadoop.However, when users are using Spark, users are required to have relevant professional quality and learn expertise related of Spark. At the same time, some of the enterprise use different Spark cluster hardware resources, cluster heterogeneity is obvious, but the default two-tasks scheduling algorithm in the heterogeneous Spark cluster of Spark does not take the ability of the node differences into account. This paper proposes and implements a big data processing visualization tool based on Spark, adopts B/S design pattern and proposes Spark's heterogeneous task scheduling algorithm.The research work of this paper is divided into the following two aspects:Firstly, this paper proposes a visualization tool based on Spark large data processing, which can process and design the data processing process based on Web. The user can create the logical process of data processing by dragging and dropping the picture, including the definition of data source, data processing of the calculation operator or Spark SQL statement, the storage of data results, to complete the data processing process design. At the same time, for the web on the generated process files, design and implementation of the Spark-based computing engine Jar package, used to resolve the file.Secondly, Spark's hungarian dichotomy task scheduling algorithm is proposed. Considering the heterogeneity of the Spark cluster, the Spark task and the node are abstracted into a bipartite graph, and the new Spark task scheduling algorithm is implemented according to the delay of the task and the node using the Hungarian algorithm.At last, this paper experiments with Spark's large data processing visualization tool, and tests the system from the aspects of function,performance and algorithm validity. The results show that the proposed large data processing visualization tool based on Spark can meet the basic needs of users.
Keywords/Search Tags:Big data processing, Spark, Graphical operation, Hungarian algorithm
PDF Full Text Request
Related items