Font Size: a A A

Research And Implementation Of Cross-platform Unified Big Data Intelligent SQL Query System

Posted on:2021-02-23Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZhangFull Text:PDF
GTID:2428330647451071Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years,with the increasing demand for big data analysis and processing in various industries,big data query systems have gradually developed in a diversified direction.These data query systems have their own characteristics in terms of query language,computing model,system architecture and underlying storage technology which are designed for different application scenarios.Therefore,in order to deal with diversified business,modern enterprises or organizations usually build a variety of different data query systems.However,many comprehensive businesses need to be able to carry out convenient and efficient cross-platform data query,for example,the unified analysis of data between departments within the same organization.Therefore,how to make full use of the characteristics of different computing platforms to complete efficient and convenient cross-platform data query has become a current research hotspot in academia and industry.Existing work has certain deficiencies in model structure and query performance,and it is difficult to meet the actual application requirements of complex cross-platform data query.In view of the shortcomings of the existing work,this paper proposes a cross-platform unified SQL query model and its performance optimization method,and on this basis,designs and implements a cross-platform unified big data intelligent SQL query system Coral.The main research contents and contribution points of this article include:(1)A cross-platform unified SQL query model is proposed.The model provides users with a unified query language that supports cross-platform SQL queries and can shield the heterogeneity of the underlying execution platform,allowing users to join table data in different underlying databases in one query statement.The model can process query statements submitted by users according to the process of statement parsing,query optimization,and execution scheduling,and automatically complete sub-query scheduling and data migration according to the execution plan.The entire cross-platform query process is completely transparent to users.(2)A hybrid cross-platform query optimizer combining rule-driven and datadriven is designed.The system contains two optimizers for different query scenarios.Among them,the Cascades-based optimizer integrates the conversion between data sources into the Cascades optimizer in an empirical rule,and realizes cross-platform query optimization on the basis of maintaining the advantages of the Cascades optimizer to reduce data migration and query execution overhead;a cross-platform query optimizer based on Deep Q Network(DQN)uses machine learning techniques to generate a dataset-specific join search strategy.Given a cost model and search space,The optimizer can optimize the search process in all possible join plans for a specific data set,and learn specific search strategies based on the results of previous plan instances,thereby significantly reducing the search time of the execution plan and improving the optimization effect.(3)Designed a frequent sub-query cache model based on query history analysis.The model periodically persists frequently used query results to the appropriate underlying platform based on query history information(i.e.generates cache view),and reuses cache view data through semantic-based view matching algorithm to replace some sub-queries and data migration operations in cross-platform data query,thereby reducing migration overhead and optimizing the performance of cross-platform data query.(4)Based on the query model and performance optimization method proposed above,this paper designs and implements an efficient cross-platform unified big data intelligent SQL query system Coral.The system integrates three mainstream databases with different characteristics,Mem SQL,Clickhouse and Postgre SQL.Coral provides users with a unified cross-platform query language and shields the heterogeneous type of the underlying platform,provides a cross-platform query optimizer,and improves execution performance through code generation(Codegen)technology,which can automatically and efficiently execute cross-platform queries submitted by users and provide users with platform transparency and execution transparency.(5)The performance evaluation and analysis of the cross-platform query system Coral and related optimization technologies proposed in this paper are carried out through experiments.Experiments show that the hybrid cross-platform query optimizer proposed in this paper can effectively reduce the time to generate the execution plan and improve the efficiency of the execution plan by combining the traditional optimizer and the dataset-specific optimizer.The cache model has obvious effect when the subquery is repeatedly accessed.Compared with the mainstream cross-platform query systems Mu SQLE and Sloth,the Coral proposed in this paper achieves better performance in cross-platform query,which achieves an average speedup of 5.03 times compared to Mu SQLE and 2.04 times speedup compared to Sloth.
Keywords/Search Tags:Cross-platform, Data-driven, Join, Query Optimization, Sub-query Cache
PDF Full Text Request
Related items