The Design And Implementation Of Data Locality Subsystem In HAWQ

Posted on:2017-02-25

Degree:Master

Type:Thesis

Country:China

Candidate:X Sheng

Full Text:PDF

GTID:2308330485961846

Subject:Engineering

Abstract/Summary:

PDF Full Text Request

With the rapid decrease of store data on Hadoop, the way of how enterprises store data and processing data is continuous changing. More and more enterprises start building data center on HDFS. At the same time, more and more databases are going to put SQL on Hadoop, based on the simplicity of SQL and the scalability and robost of Hadoop distributed storage, so people can interact with data on hadoop using SQL.HAWQ, one of SQL on Hadoop products, stores its data on HDFS. It uses segment to execute query plan. There maybe exists network latency if one segment wants to query some data on another host. This network latency has a bad impact on the performance of query. So there need a data locality subsystem for this problem.This paper introduces some SQL on Hadoop products, introducing their advantages and disadvantages. The paper analyzes the root cause of network latency and the requirements to solve this problem. Then describes a data locality subsystem for allocate the block to the right calculate unit of HAWQ based on the requirements. It will calculate the compute unit number and allocate the data block to different virtual segment, which really execute query plan. HAWQ will use the allocation to ensure the maximume data size of data local read. The more the local read, the faster the query. This paper also shows the implementation of the subsystem. Finally there are some tests at the end of this paper to test the subsystem. This paper concludes that the subsystem is reasonable, efficient from the test results.

Keywords/Search Tags:

Multiple backup data, Distributed database, Elastic execution, Data locality

PDF Full Text Request

Related items

1	Research On Distributed Data Full Backup And Incremental Backup Of File System
2	Automotive Industry Mes System Data Protection
3	Research On A Distributed Databases Backup And Recovery Mechanism Based On Multi-platform
4	Research On A Distributed Heterogeneous Database Backup And Recovery Mechanism
5	Research On The Data Restoration Technology Of Distributed Database System
6	Realization Of Distributed Data Storage And Backup
7	The Securities Industry Information System And Application Research On Backup Capacity Building
8	Design And Implementation Of Private Data Backup And Recovery Applications Based On Java
9	Research On File Similarity-Based Deduplication In Network Backup
10	Research On Duplicate Data Detection In Data Deduplication