Font Size: a A A

Optimization Of Query Algorithm For Distributed Relational Database

Posted on:2021-05-23Degree:MasterType:Thesis
Country:ChinaCandidate:M FanFull Text:PDF
GTID:2428330623467809Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the growth of data volume and the changing application scenarios,the architec-ture of the database system has changed dramatically.The emergence of the distributed relational database(NewSQL)merges the SQL and NoSQL schemas.NewSQL provides an external SQL interface,distributed transactions,and high scalability.Due to the in-crease of stored data,compared with traditional relational data,the application scenarios of NewSQL database are no longer limited to online transaction analysis,but also include some large-scale complex analysis queries and offline analysis scenarios,after the basic functions of NewSQL include distributed transactions,the mapping of SQL to NoSQL and other technologies are basically improved.How to improve the query performance under large complex analysis queries is the key problem of NewSQL database.In traditional relational databases,the query optimizer provides the solution to this problem.The optimizer will select the least expensive query plan from hundreds or even thousands of query plans according to the cost of the query to execute the query.But in the distributed environment,the difficulty of cost estimation is increases.As a result,it will be more difficult to select the best query plan through query optimization techniques,thus reducing query performance.For large complex analysis queries,after the optimizer generates a suboptimal query plan,the robustness of the query plan can be ensured by reducing the network overhead under distribution,and the impact of the execution of the plan on the performance of the database can be reduced.In this thesis,I built a cluster experimental environment based on TiDB,which is an open source distributed relational NewSQL database.I also designed and implemented the distLIP algorithm based on Lookahead Information Passing(LIP)algorithm.In the algorithm,the network transmission overhead of data is greatly reduced by pushing the distLIP operator down to the storage layer for calculation.The robustness of the query plan is improved Through the adaptive sorting algorithm,whitch reduces the size of the memory occupied by reducing the intermediate calculation results.In this thesis,cuckoo filter is also used to replace the bloom filter in the original LIP algorithm,which improves the space occupancy rate of data and the computational efficiency of query.Finally,I use the Star Schema benchmark(SSB)to test the query execution time with and without the distLIP algorithm for large complex analysis queries.Experimental results show that using distLIP algorithm on TiDB is effective and can improve the performance of Star Schema query and similar query scenarios.
Keywords/Search Tags:Query optimization, Execution engine, newSQL database, TiDB, CuckooFil-ter
PDF Full Text Request
Related items