Font Size: a A A

Analysis And Parallelization For Join Algorithms Based On Persistent In-memory File System

Posted on:2017-03-26Degree:MasterType:Thesis
Country:ChinaCandidate:L W ZhaoFull Text:PDF
GTID:2348330509453996Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Join is the fundamental and most expensive operation in relational database, it has great impact on the performance of the database. To meet the demand of high-performance and low power consumption, the industries and academic communities have exploited the fast-accessing, byte-addressable Non-volatile memory to persistently store data. As a consequence, many file systems have been developed for NVM. Those file systems have different ways of accessing data from the traditional file systems for block devices. For database systems, since data tables are stored in file system, the performance of the file system essentially determines the performance of join operation. However, none of previous works have investigated the bottleneck for join operation for in-memory file systems.In this paper, we use SIMFS(Sustainable In-Memory File System) to study the I/O path difference of in-memory file system and file system for block device(EXT4). Then, we implement Nest Loop Join ? Sort Merge Join and Hash Join and test the performance on SIMFS and EXT4 with various block sizes.In this paper, we analyze the optimization of in-memory file system join operation, the impact of different block sizes to join and the performance among three different join operations, as well as the data access characteristics of join. The experimental results show that the above mentioned join operations have great performance boom when running in SIMFS, compared with EXT4. The block sizes have different impact on different file systems. The traditional block file systems are more sensitive to block sizes. Hash Join and Sort Merge Join reduce the data accesses by hashing and sorting, they are faster than Nested Loop Join. Hash Join is faster than Sort Merge Join. The performance differences vary on different file systems. Different file systems exhibit different data access time. The in-memory file system witnesses great reduction on data access time.Based on the results analysis, we propose optimization techniques for join operation. Different from file systems for block devices, in-memory file system join must take into consideration of computation time, instead of just the data access cost. So, we redesign Hash Join with OpenMP to paralyze the data access and hashing. The results show that the parallelization performance of data access and hashing improves by 40%. In the worst case, the performance of parallel data access and hashing can improve 11%, with fairly good results obtained.
Keywords/Search Tags:Join operations, In-memory file system, Block-based file system, Performance optimization, Parallel
PDF Full Text Request
Related items