Font Size: a A A

The Design And Implementation Of Data Locality Subsystem In HAWQ

Posted on:2017-02-25Degree:MasterType:Thesis
Country:ChinaCandidate:X ShengFull Text:PDF
GTID:2308330485961846Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the rapid decrease of store data on Hadoop, the way of how enterprises store data and processing data is continuous changing. More and more enterprises start building data center on HDFS. At the same time, more and more databases are going to put SQL on Hadoop, based on the simplicity of SQL and the scalability and robost of Hadoop distributed storage, so people can interact with data on hadoop using SQL.HAWQ, one of SQL on Hadoop products, stores its data on HDFS. It uses segment to execute query plan. There maybe exists network latency if one segment wants to query some data on another host. This network latency has a bad impact on the performance of query. So there need a data locality subsystem for this problem.This paper introduces some SQL on Hadoop products, introducing their advantages and disadvantages. The paper analyzes the root cause of network latency and the requirements to solve this problem. Then describes a data locality subsystem for allocate the block to the right calculate unit of HAWQ based on the requirements. It will calculate the compute unit number and allocate the data block to different virtual segment, which really execute query plan. HAWQ will use the allocation to ensure the maximume data size of data local read. The more the local read, the faster the query. This paper also shows the implementation of the subsystem. Finally there are some tests at the end of this paper to test the subsystem. This paper concludes that the subsystem is reasonable, efficient from the test results.
Keywords/Search Tags:Multiple backup data, Distributed database, Elastic execution, Data locality
PDF Full Text Request
Related items