Font Size: a A A

Research On Tiered Hashing Index And Incremental Resize For Solid State Drive

Posted on:2022-07-15Degree:MasterType:Thesis
Country:ChinaCandidate:L ShiFull Text:PDF
GTID:2518306572491024Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
The raise of big data technologies challenges the performance and capacity of data storage.The Unified Memory-Storage Hierarchy(UMH)employs PCIe interconnect technology empowered byte-addressable solid-state drives(SSDs)to overcome the DRAM capacity limitation and reduce massive data copies between main memory and external storage in the traditional hierarchical storage architectures.However,hash indexes which is commonly used in in-memory databases confront two problems under UMH.First,hash indexes use hash functions to distribute key-value pairs evenly and randomly across the allocated space,thus,cause enormous random I/O and increase SSD's garbage collection overhead.Second,hash resizing introduces substantial data migration,resulting in I/O contention,hence,increases the long-tail latency of operations including insertion and point query.Eventually,such a hash resizing operation significantly increases the write amplification in SSDs and hinder the performance.Therefore,redesigning hash indexes to improve the performance and lifetime of SSDs becomes a crucial research topic.To address the write amplification problem caused by random writes in hash indexes,we propose Tiered Hashing for byte-addressable SSDs.Tiered Hashing writes to the upper hash table first.When there are no open slots in the upper layer,it then writes to a lower layer and adopts the rent-in-advance algorithm(RIA)to migrate the upper data to the updated bucket.The RIA increases the number of open slots in the upper table and achieves skewed write frequency across layers.Finally,multi-stream technology is employed to reduce the SSD garbage collection overhead.Experimental results show that the RIA algorithm can improve SSD write performance by 1.78 times compared to existing hashing algorithms.To address the I/O contention and long-tail latency issues caused by hash resizing,we propose incremental resizing in last layer algorithm(IRLL).IRLL allocates a virtual memory space for the new expansion layer and only performs fine-grained resizing for conflicting paths to reduce data migration.As the SSD only allocates physical NAND flash pages when we first write to a virtual page,IRLL significantly improves the physical space utilization.The experimental results show that the space and time overhead of the IRLL expansion algorithm is independent of the amount of written data in the hash table compared to existing hashes.The average execution time of the resizing is reduced by 3 orders of magnitude for a table with 3072 initial capacity.The shadow read(SR)and table jumping read(TJR)algorithms are proposed to improve the read performance under the multi-layer table structure of Tiered Hashing.SR algorithm leverage shadow keys in the upper layer to improve read hit rate.TJR algorithm uses the upper bucket storage lookup layer to achieve accelerated lookup.Experimental results show that the SR algorithm reduces the average latency by 29.5%,the TJR algorithm reduces the average latency by 61.5%,and the SR and TJR algorithms can reduce the average latency by 74.8% when both algorithms are enabled.
Keywords/Search Tags:NAND flash, Multi-stream logging, Tiered hashing, Unified memory-storage hierarchy
PDF Full Text Request
Related items