The Design And Implement Of Data Source Connector Based On Spark SQL

Posted on:2019-06-05

Degree:Master

Type:Thesis

Country:China

Candidate:Y Z Tao

Full Text:PDF

GTID:2428330590992468

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

With the advent of the Big Data era,there is an increasing need for users to compute and store huge amounts of data.Spark,as a memory-based distributed computing framework,has been widely recognized by the industry for its excellent performance in recent years,and Spark SQL in the Spark ecosystem has also become a solution when a lot of enterprises are in face of massive data analysis and processing issues.At the same time,massive data storage engines are also rapidly evolving,and various storage engines supporting the needs of different business scenarios have emerged.The progress of these massive data processing technologies has provided strong support for the development of the big data industry.Enterprise data will be distributed in different storage engines and systems,so when we need to access these data into the computing engine for general statistical analysis,we usually face the problem of data islands.At present,the problem of data islands has been solved to a certain extent by building a one-stop big data platform.However,in the face of the everchanging storage engine,how to quickly and conveniently connect computing engines to data sources on different storage engines and how to help computing engine understanding and adapting the storage engine to improve the computational efficiency,are still problems to be solved in the current mass data processing business scenario.In view of the above problems,this paper presents a data source connector based on Spark SQL-Stargate,which provides a set of framework for different storage engine to connect to the Spark SQL computing engine,and extracts some part of the SQL analysis calculations and push them down to the data source,so we can use the characteristics of some storage engine to quickly perform some calculations and make the calculation and analysis process more efficient.This paper designs and implements Stargate,a data source connector based on Spark SQL,and experimentally tests Stargate's functionality and performance.Experiments show that Stargate can perfectly match Spark SQL computing engine with multiple storage engines and can help computing engines understand and adapt storage engines to improve the computational analysis efficiency.

Keywords/Search Tags:

Big Data, Apache Spark, Spark SQL, Data Source Connector

PDF Full Text Request

Related items

1	OCTWAS - Online Check-pointer for Workflows on Apache Spark
2	Using apache spark for scalable gene sequence analysis
3	Research On The Discretization Algorithm Of Big Data Based On Spark
4	Research On Taxi Trajectory Organization Method Based On Apache Spark
5	Research On Apache Spark Distributed Parallel Computing Framework Optimization Technology
6	A Big Data Analyzing Facility Based On Spark Supporting Standard SQL Grammar
7	Design And Implementation Of Telecom 4G Big Data Platform For Network Optimization Based On Spark
8	Research On K-Prototypes Algorithm Based On Mixed Data And Implementation Of Spark Platform
9	A System For Distributed MD Data Analysis Based On Spark
10	Design And Implementation Of A Heterogeneous Data Source Exchange System Based On Spark