Font Size: a A A

Join Optimization In Distributed Database System

Posted on:2018-08-20Degree:MasterType:Thesis
Country:ChinaCandidate:L WangFull Text:PDF
GTID:2348330512494144Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Recent development of Internet has yielded the growing volumes of data.To meet the demands of processing high concurrent query requests towards those "big data",the capability of scaling out storage and computation in database system becomes increasingly important.Distributed database with good scalability haveattracted the attention from both industry and academy.In this paper,we study the join query optimization under a scalable distributed storage architecture.We summarize three factors that affect the query efficiency of the join:local data extraction,data transmission across network and the efficiency of the join operators.Based on the distributed storage architecture and these three factors,we design a distributed database join optimization framework,which can effectively reduce the query response time and improve the user experience.In particular,the main contributions of this paper are threefold:1.This paper proposes an effective optimization framework for join query under a distributed database architecture.All join queries are improved via optimizing the parallel execution,join operator and semi join operation.In addition,we implement the framework based on OceanBase,which is an open source distributed database.2.Based on the optimization framework and the open sourced system OceanBase,we design and implement several parallel join operators including nested loop join,Hash join and semi join operation.Leveraging the parallelism is a key pillar to improve join efficiency.On the one hand,pull and process of data in parallel accelerate the local data extraction,and also effectively reduces the network transmission in join operations(e.g.semi-join).On the other hand,it makes full use of computing resources.3.We conduct comprehensive experiments to evaluate the validation and efficiency of the distributed query optimization framework.Based on a benchmark of Sysbench,experiment results illustrate the performance of the proposed optimization framework.The experimental results indicate that the parallel join operators can effectively reduce the response time and improve the query efficiency.The testing results of the proposed join optimization framework demonstrate that three aspects of parallelism,connection operator and semi join operation can be used to improve the performance of the system.The proposed optimization framework and parallel join operators can effectively reduce the response time and improve the query efficiency.In the same time,the proposed join optimization framework not only equips with reference meaning for other different types of scalable distributed storage architecture systems,but also provides the reference for the join query optimization.
Keywords/Search Tags:Distributed Storage, Distributed Database, Join Query, Query Optimization, Optimization Framework
PDF Full Text Request
Related items