Analysis And Parallelization For Join Algorithms Based On Persistent In-memory File System

Posted on:2017-03-26

Degree:Master

Type:Thesis

Country:China

Candidate:L W Zhao

Full Text:PDF

GTID:2348330509453996

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Join is the fundamental and most expensive operation in relational database, it has great impact on the performance of the database. To meet the demand of high-performance and low power consumption, the industries and academic communities have exploited the fast-accessing, byte-addressable Non-volatile memory to persistently store data. As a consequence, many file systems have been developed for NVM. Those file systems have different ways of accessing data from the traditional file systems for block devices. For database systems, since data tables are stored in file system, the performance of the file system essentially determines the performance of join operation. However, none of previous works have investigated the bottleneck for join operation for in-memory file systems.In this paper, we use SIMFS(Sustainable In-Memory File System) to study the I/O path difference of in-memory file system and file system for block device(EXT4). Then, we implement Nest Loop Join ? Sort Merge Join and Hash Join and test the performance on SIMFS and EXT4 with various block sizes.In this paper, we analyze the optimization of in-memory file system join operation, the impact of different block sizes to join and the performance among three different join operations, as well as the data access characteristics of join. The experimental results show that the above mentioned join operations have great performance boom when running in SIMFS, compared with EXT4. The block sizes have different impact on different file systems. The traditional block file systems are more sensitive to block sizes. Hash Join and Sort Merge Join reduce the data accesses by hashing and sorting, they are faster than Nested Loop Join. Hash Join is faster than Sort Merge Join. The performance differences vary on different file systems. Different file systems exhibit different data access time. The in-memory file system witnesses great reduction on data access time.Based on the results analysis, we propose optimization techniques for join operation. Different from file systems for block devices, in-memory file system join must take into consideration of computation time, instead of just the data access cost. So, we redesign Hash Join with OpenMP to paralyze the data access and hashing. The results show that the parallelization performance of data access and hashing improves by 40%. In the worst case, the performance of parallel data access and hashing can improve 11%, with fairly good results obtained.

Keywords/Search Tags:

Join operations, In-memory file system, Block-based file system, Performance optimization, Parallel

PDF Full Text Request

Related items

1	Research On File System Optimization And Shared File System For Non-Volatile Memory
2	Key Technologies Of NVM Block Driver For Traditional File Systems
3	The Optimization Method Research For Small File Data Storage Performance On Hadoop Distributed File System
4	Research On Performance Modeling And Application Of Distributed File System
5	The Design And Implementation Of The Parallel Network File System PNFS
6	Performance Optimization And Implementation Of Persistent File System For Non-volatile Memory
7	Research Of Flash File System Based On Large Capacity NAND Flash Memory
8	Scalable capability-based authorization for high-performance parallel file systems
9	File System Based On Persistent Memory
10	Research On Parallel File Syetem In Search Engine