Font Size: a A A

Research And Implementation On Cross-Platform Unified Big Data SQL Query System

Posted on:2020-09-19Degree:MasterType:Thesis
Country:ChinaCandidate:L L YinFull Text:PDF
GTID:2518305735451814Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years,with the increasing demand for big data analysis in various industries,many database systems for big data have emerged.These systems often have great differences in query languages,data formats,computational models and underlying storage technologies,which greatly increase the complexity of big data analysis and limits the possibility of cross-platform query.At the same time,many practical businesses also put forward the need to execute cross-platform queries conveniently and efficiently through SQL statements.Cross-platform query has increasingly become a research hotspot in academia and industry.In order to solve the problem of ease of use,unification and performance of a cross-platform query system,we study the model and framework of the SQL-based cross-platform unified query system,cross-platform query optimizations,data migration technologies and the complete system design.On this basis,we present a cross-platform unified big data SQL query system termed Sloth.The main work and contributions are as follows.(1)We propose a unified model for cross-platform queries,which provides users with a unified cross-platform query language,shields the heterogeneity between the underlying platforms and allows users to join tables span different platforms.This model processes cross-platform queries submitted by users according to three stages of parsing,optimizing and scheduling.It automatically completes subqueries and data migrations according to the execution plan,so that the whole process of a crossplatform query is completely transparent to users.(2)We propose an advanced two-stage cross-platform query optimizer.The first stage is a rule-based optimizer,which is responsible for preprocessing the logical plan.Preprocessing consists of three steps:logical plan optimization,sub-query partitioning and join reordering.The second stage is a cost-based optimizer,which enumerates all possible execution plans and calculates the cost of each execution plan,and uses dynamic programming to reduce overhead,so as to quickly find the best execution plan.(3)We propose an online method for tuning the physical design of a multi-store system.According to the query history,common query results are persisted periodically to the appropriate underlying platforms.We use a semantic-based view matching method to replace some subqueries and data migration operations in the execution of a cross-platform query.In order to optimize the performance of Joins in SparkSQL,we also propose a semantic-based method for reusing shuffling data.(4)Based on the above framework and optimizations,we design and implement an efficient cross-platform unified big data SQL query system termed Sloth,which integrates SparkSQL,MemSQL and PostgreSQL.Sloth provides users with a unified query language and the transparency of execution,shields the heterogeneity of underlying platforms and implements parallel data migration between multiple platforms,so as to automatically and efficiently execute cross-platform queries.Experiments show that the shuffling data reuse technology proposed in this paper effectively improves the performance of Joins in SparkSQL.Sloth’s parallel data migration technology greatly improves the performance of data migration,up to 8.9 times the performance of MuSQLE.Compared with MuSQLE,SparkSQL,PostgreSQL and MemSQL,Sloth achieves the best performance in cross-platform queries,with an acceleration ratio of more than one order of magnitude.
Keywords/Search Tags:Cross-platform, Unified query language, Join, Cross-platform query optimization, Materialized view, Data migration
PDF Full Text Request
Related items