Font Size: a A A

The Design And Implement Of Data Source Connector Based On Spark SQL

Posted on:2019-06-05Degree:MasterType:Thesis
Country:ChinaCandidate:Y Z TaoFull Text:PDF
GTID:2428330590992468Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the advent of the Big Data era,there is an increasing need for users to compute and store huge amounts of data.Spark,as a memory-based distributed computing framework,has been widely recognized by the industry for its excellent performance in recent years,and Spark SQL in the Spark ecosystem has also become a solution when a lot of enterprises are in face of massive data analysis and processing issues.At the same time,massive data storage engines are also rapidly evolving,and various storage engines supporting the needs of different business scenarios have emerged.The progress of these massive data processing technologies has provided strong support for the development of the big data industry.Enterprise data will be distributed in different storage engines and systems,so when we need to access these data into the computing engine for general statistical analysis,we usually face the problem of data islands.At present,the problem of data islands has been solved to a certain extent by building a one-stop big data platform.However,in the face of the everchanging storage engine,how to quickly and conveniently connect computing engines to data sources on different storage engines and how to help computing engine understanding and adapting the storage engine to improve the computational efficiency,are still problems to be solved in the current mass data processing business scenario.In view of the above problems,this paper presents a data source connector based on Spark SQL-Stargate,which provides a set of framework for different storage engine to connect to the Spark SQL computing engine,and extracts some part of the SQL analysis calculations and push them down to the data source,so we can use the characteristics of some storage engine to quickly perform some calculations and make the calculation and analysis process more efficient.This paper designs and implements Stargate,a data source connector based on Spark SQL,and experimentally tests Stargate's functionality and performance.Experiments show that Stargate can perfectly match Spark SQL computing engine with multiple storage engines and can help computing engines understand and adapt storage engines to improve the computational analysis efficiency.
Keywords/Search Tags:Big Data, Apache Spark, Spark SQL, Data Source Connector
PDF Full Text Request
Related items