Font Size: a A A

The Designand Implementation Of Distributed SQL Execution Plan Generation System Based On MPP Technology

Posted on:2015-03-18Degree:MasterType:Thesis
Country:ChinaCandidate:T WangFull Text:PDF
GTID:2298330452961277Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In today’s society, with the requirement for larger dataset analysis, traditionalhardware and software solution need higher and higher cost and still can’t satisfybig data’s analysis. In this paper, a new system Whale is designed, aims at largescale data’s analysis using distributed SQL execution, but it has good scalability andcan support big data’s scenario, at the same time, it can still keep high speedanalysis, because the system benefit from its MPP shared-nothing parallelarchitecture, it use MySQL as back-end analysis engine, and design a parallelexecution engine in upper level, and, it use HDFS to storage large data to keepstorage reliability.At first, I do some research about data-warehouse, parallel computing,distributed architecture for this project, the objective of this project is to solveseveral problems in current big data analysis product, open source or commercial,the key points are scalability, fault tolerance and complicated SQL analysis.Then, this paper describes the Master-Slave architecture and execution flow inthis project. It is developed in Windows platform (mainly using eclipse), anddeployed on Linux cluster. This system is a Master-Slave architecture divided intoserver and client, interaction by remote procecude call. The client module isresponsible for parsing user’s command and sending request to server, and fetchingresults from server and display results to user. In this process, the core step isgenerating query plan for SQL user types in, the relative code is parser and optimizer.The server includes master deamon and slave deamon, master deployed on centralserver node and slave deamon deployed on slave server node. Master receivesexecution plan from client and executes it by data sharding parellel processingdirecting a series of slave nodes, and slaves do the actual data analysis by usingback-end MySQL engine. There is also a meta-data module integrated into masterthat used for managing meta-data of the whole system, including table schema, tabledata split allocation table, and so on. This architecture is graceful and light weight,using different engine in MySQL, we can get hight performance in difference dataanalysis scenario, and upper level architecture make the system easy to scale out, byadding new nodes, adapting to growth of big data we want to analysis.Through theabove design, the system solved the complex, scalable and parallel problems andmeet the big data requirements, to enhance the user experience.At last, I do several test from different level, including unit test, functional test and performance test in this paper, and show demonstration result of systemdeployment. By contrast with other similar system, the testing results reveal oursystem’s advantage in big data analysis.
Keywords/Search Tags:Master-Slave architecture, Execution plan generation, Remoteprocecude call, Distributed SQL execution processing, Data sharding parellelprocessing
PDF Full Text Request
Related items