Font Size: a A A

Research On Exploring And Exploiting Parallelism Of Large-scale Flash-based Devices

Posted on:2016-12-04Degree:DoctorType:Dissertation
Country:ChinaCandidate:R H WangFull Text:PDF
GTID:1108330509961009Subject:Electronic Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of technology such as internet, multimedia, sensor, etc.,the amount of global information is exploding exponentially. The digital process of information society increases demands on the storage capacity and qualities of data service of storage systems, while the trend of the ability of traditional storage systems obviously lagging behind the computing capability has intensified. The appearance of non-volatile solid-state storage medium changes the characteristics of the traditional magnetic disks that contain a mechanical parts to access data, and brings a revolutionary change to the storage technology. Among all kinds of the solid-state medium, flash memory is advancing rapidly and is expected to impact greatly on the future of storage systems. The flash-based device behaves superior to magnetic disks in terms of bandwidth, latency and random I/O.Moreover, it offers higher power-efficiency as well as higher robustness to shock and vibration, thus improving the system reliability significantly. This thesis focuses on exploring and exploiting the parallelism of large-scale flash-based devices, including the buffer management, storage management, I/O queue scheduling and operation parallelization.Specifically, the main work and contributions of this paper are as follows:1. Proposing a buffer replacement with the channel-aware write reordering mechanism(Chapter 2)The modern solid-state drive(SSD) uses an on-disk buffer in- between the host interface and the flash translation layer(FTL). The buffer stores data from the host first and then writes the data to the NAND flash memory afterwards. The replacement policy employed by the write buffer should take both the write sequence and the FTL algorithm into account. Furthermore, as the SSD usually contains several independent channels, the replacement policy is also responsible for scheduling write requests among these channels to exploit parallelism. Most existing buffer replacement policies do not consider hardware parallelism, and their evicting sequence contains quite a number of consecutive pages even from a page-level buffer.When these buffer policies are applied to independent channels, directly scheduling the consecutive sequence to separate channels does help to exploit parallelism;however, this will increase the overhead of garbage collection significantly, since scheduling a consecutive write sequence to multiple channels will break the spatial locality of write traffic. This work proposes a channel-aware write reordering(CAWR) mechanism to recognize the patterns of pages to be evicted from the buffer.CAWR uses a reordering region to encapsulate correlated pages into cluster. When buffer replacement is required, the CAWR mechanism schedules a cluster to each channel, guaranteeing that all channels are operated in parallel. Furthermore, as all pages in a given cluster are correlated, scheduling the entire cluster to a given channel guarantees that the spatial locality of write traffic remains intact, thus it can reduce the overhead of garbage collection. Besides, since CAWR only reorders pages at the end of the buffer, it cannot critically affect the hit ratio of a buffer.2. Proposing a multi-channel FTL with unbalanced channel management(Chapter 3)A typical NAND flash-based SSD consists of multiple independent channels that can be operated in parallel. This architecture leaves the parallel exploiting for the FTL to improving the channel utilization and channel parallelism. On the other hand, the FTL contains several internal tasks such as garbage collection(GC) and wear leveling. These internal processes bring extra operations and latencies, causing I/O blocking and performance degradation, especially when the device is close to be full. To eliminate the blocking introduced by garbage-collecting activities,this paper presents a multi-channel flash translation layer with unbalanced channel management(U-MFTL). The U-MFTL divides the FTL into two layers: the SSDlevel layer and the channel-level layer. The channel-level layer is responsible for managing the address space within each channel, that is, each channel has a dedicated garbage collector and a dedicated block allocator. The SSD-level schedules I/O requests among channels globally. The novelty of this level lies on the fact that it schedules the garbage collection and request services unevenly among these channels, eliminating collisions between external host operations and internal cleaning operations so that different activities can be operated in parallel. Trace-driven simulations reveal that the proposal eliminates I/O blocking, improving the device performance without causing load unbalance or degrading the cleaning efficiency.3. Proposing a command queue which profits the flash-level parallelism(Chapter 4)Flash memory provides flash-level parallelism at die- and plane-level inside a flash chip and the flash manufacturers have provided several advanced commands, with an intention to exploring the in-chip parallelism by handling read, write and erase operations more efficiently. However, there are some strict restrictions must be adhered to using these commands, that makes the command scheduling more difficult to pack basic commands into parallel commands. Currently, the flash-level parallelism is exploited with striping technology during allocating the physical address to write requests. However, because the FTL does not consider the queue status of outstanding requests at plane-, die-, and flash package-level, conflicts of hardware may occur, degrading the hardware utilization. This work proposes a command queue that has multiple lines for each flash chip, where operations are easier to be packed into advanced parallel commands, utilizing the flash-level parallelism. Besides, the work also propose an address allocation strategy which is based the queue utilization. And since the queue can reflect how busy the underlying hardware is,this allocation strategy can create more parallelizable operations, thus improving the hardware utilization.4. Proposing a garbage collector which parallelizing cleaning operations as well as cleaning activities(Chapter 5)Flash-based devices internally provide multi-level parallelism, and parallelizing flash operations is the key to improving the performance. Most of existing research works dispatch and schedule host requests so that the obtained sub-requests can be served in parallel, however, those works seldom parallelize the extra operations introduced by the internal garbage collection process. The costly operation sequence of garbage collection is the main reason for I/O blocking. This work proposes a novel Subdivided Garbage Collector(SGC), which exploits both the system-level and the flash-level parallelism to parallelize garbage-collecting operations as well as garbage-collecting activities. SGC confines the GC process inside a flash chip, utilizing the system-level parallelism to overlapping garbage-collecting activities with I/O services among different chips. The flash-level parallelism is further exploited with a novel queue mechanism, which schedules and packs the reordered partial steps of cleaning sequence into parallel operations. To make more parallelization possible, a dynamic conflict-aware address allocator is proposed to eliminate the host writes and cleaning operations from contending for the critical components of the device. Trace- driven simulations demonstrate that the proposals can hide overheads of garbage collection, resulting in a shorter response time.
Keywords/Search Tags:Flash memory, Buffer management, Flash translation layer, Hardware utilization, Command queue, Garbage collection, Operation parallelization
PDF Full Text Request
Related items