Design And Implementation Of The Massive Data Computing Platform Based On Spark

Posted on:2017-06-13

Degree:Master

Type:Thesis

Country:China

Candidate:K Y Jiang

Full Text:PDF

GTID:2348330488959927

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

The data processing technology is combined with data storage and data computation that its main goal is to achieve mining analysis for all kinds of data.In recent years, UC Berkeley AMP Lab develops a new framework for massive data processing called Spark which is gradually into our horizon. It is not only perfects the early popular Hadoop framework, but also proposes the resilient distributed datasets RDD (Resilient Distributed Datasets) and more flexible programming model, which provides a simpler and more efficient way for processing massive data.With the advent of the massive data era, many companies will often encounter the problem of massive data processing and analysis. Charge, complex operations, non-customized algorithms and non-intuitive results are problems that the massive data processing systems have. The paper is proposed to a platform which is based on Spark, so that it can run efficiently to achieve massive data storage and computation. It also allows users to upload customized algorithms, then algorithms can run after easy configuration. Based on Webx framework, the system provides service through website to reduce the cost of learning the traditional command-line operation and makes Spark operation graphical. Also, the system displays analyzing results in various ways, which will make researchers more convenient for further study.Firstly, by analyzing the Spark distributed computation framework and Web technology current development in detail, the paper presents the problems of massive data processing and lists functions, performance and demands of system. On this basis, the paper introduces the Webx framework detailed, and utilizes the framework to achieve website of all system functions in visualization. Secondly, by analyzing parallel computation model, MLlib is used to implement classic data mining algorithms. Then, through analyzing data storage mechanism, Mysql database combined with distributed file system HDFS is used for storing users and algorithms information. Finally, connection technology Secure Shell implements interaction between the front website and the back Spark cluster.

Keywords/Search Tags:

Massive Data, Webx, Spark, Visualization

PDF Full Text Request

Related items

1	The Design And Implementation Of Big Data Competition Platform Base On Webx
2	Research On Key Technologies Of Spatial Data Visualization With Spark
3	Visualization Analysis Of Big Data Based On Spark
4	A Frequent Serial Episode Mining Algorithm With Time Constraints Based On Spark Platform
5	The Research And Application Of Collaborative Filtering Algorithm With Big Data Visualization Model
6	Design And Implementation Of Big Data Processing Visualization Tool Based On Spark
7	Research On SPARK Based Massive Data Frequent Pattern Mining Algorithms
8	Design And Implementation Of Data Processing And Visualization Based On Memory Compuing In Eole System
9	Interactive visualization of massive terrain
10	The Research And Implementation Of A Tripartite Application Platform Based On Webx Framework