Font Size: a A A

Research On Indexing Mechanisms For Flash Based Devices

Posted on:2018-07-28Degree:DoctorType:Dissertation
Country:ChinaCandidate:C C YangFull Text:PDF
GTID:1318330512485615Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of technological manufacture,there is a large increase in the storage density of flash memories.The solid state drive(SSD),which based on flash memory arrays,emerges as required,and its popularity rises quickly in industrial storage systems and desktop PCs.Since SSD physically and logically shares a common interface with the magnetic disk,it has been called a "pivotal technology" that could revolutionize storage systems.As a result,SSD has emerged as a new alternative to replace magnetic disks in both academia and industry.However,SSD has some unique features different from the magnetic disk,for example,erase before write characteristics of the NAND flash,asymmetric read/write latencies,and limited block erase count.Therefore,traditional data management schemes,which are designed for I/O symmetric megenetic disks,can not fully utilize the high performance of SSDs.It is of crucial importance to consider SSD's new features,and optimize data management shcmes so as to maximize the system performance.Indexes are crucial to data retrieval,and they are widely used to accelerate data accesses on specific objects.Traditional indexes are oriented to magnetic disks with symmetric I/Os,and updates to indexes lead to a large number of random writes.Due to random writes on SSDs are especially slow,deploying indexes directly cannot completely exert the high performance of SSDs.Consequently,there have been a great deal of research on index mechanisms for SSDs.At present,the researches of flash optimized indexes can be classified into three categories according to index structures:(1)flash optimized hash indexes;(2)flash optimized tree indexes;(3)flash optimized bitmap indexes.Previous researches mainly aim to reduce random writes on flash memories,and the major technical methods include trading reads for writes,batch updates,out of place updates,transforming random writes into sequential writes,and so on.In this paper,we point out some limits among present studies:(1).previous researches focus mainly on reducing random writes to SSDs,which is typically accomplished at the expense of a substantial number of extra reads.However,considering the development of internal control technology of modern SSDs,they show a narrowing gap between read and write speeds,and read operations on SSDs increasingly affect the overall performance of indices on SSDs;(2)previous flash optimized indexes usually work well in update intensive workloads,while they have a poor performance compared to traditional indexes in search intensive workloads;(3)previous researches rarely take into account the internal parallelism of SSDs to further improve performance.Therefore,it's necessary to optimize indexes for modern SSDs with a narrowing gap between read and write speeds,and also make them sutible for all kinds of datasets.In this paper,we do researches on the linear hashing and the B+-Tree.We propose a new hash index for SSDs that is able to adapt its structure according to the change of access patterns,and further optimize its search performance.In addition,we also provide the theory basis for read/write optimized B+-Tree.The buffer pool can reduce accesses to disks,and its efficiency is critical to the overall read-write performance of indexes.In this paper,we also discuss the contradictions between access patterns on tree indexes and flash oriented buffer managements.Traditional SSD-oriented buffering methods assign special priorities to dirty pages to reduce random writes,and in this scenario,internal nodes are more likely to be evicted than leaf nodes since they have a high probability to be clean.On the other hand,internal nodes have higher access frequencies than leaf nodes have,and evicting them decreases the hit ratio.We present a new buffering scheme that takes into account frequency,recency and the state(clean or dirty)of the page to select the victim when out-of-memory.The problems mentioned above have been solved successfully.In summary,we make the following contributions.(1)We propose an SSD-optimized linear hashing index called Self-Adaptive Linear Hashing(SAL-Hashing)which can adapt its structure according to access patterns.SAL-hashing use the technical method of batch updates,and introduce the concept of group and set to improve the batch updates efficiency.Updates to the index are first buffered in memory,and then multiple update operations to the same set are committed to its corresponding log region in batch.In addition,SAL-hashing decides whether to merge a log region to the corresponding set in an online manner according to the access pattern.For a search-intensive set,it timely merges its log region to the corresponding buckets,and then avoid reading the log region in subsequent searches.For an update-intensive set,it retains the log region to keep the efficiency of batch updates.Furthermore,in order to exploit the internal parallelisms of SSDs,we apply coarse-grained writes for merging operations to achieve a high bandwidth.(2)We do research on the relationship between overflow buckets and split points in the linear hashing,and add a memory efficient structure to the SAL-hashing,and then almost every search on an arbitrary bucket is finished by one page read.This provides a comparable performance to the extendible hashing.Moreover,we also discuss issues of transaction support and crash recovery.(3)We propose a new buffering scheme for tree indexes on SSDs,it considers the probability of an index page to be accessed and its recency to detect the hotness,and then combines its state(clean or dirty)to select the victim.Besides,it packs cold dirty pages,and use a coarse-grained write technique to utilize the internal parallelism of SSDs,which also avoids small random writes.(4)Based on our lab's previously proposed read/write optimized B+-Tree,we introduce detailed cost analysis,and add discussions on concurrent access.In addition,we also conduct a series of re-designed experiments,and analyze experimental results in detail.
Keywords/Search Tags:SSD, read/write asymmetry, internal parallelism, linear hashing, buffer management, B+-Tree
PDF Full Text Request
Related items