Font Size: a A A

Design And Implementation Of High Efficient Data Migration Mechanism For Key-value Store

Posted on:2020-09-06Degree:MasterType:Thesis
Country:ChinaCandidate:J H MengFull Text:PDF
GTID:2428330590958374Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
In recent years,with the development of the mobile Internet,the key-value storage system is widely used.It is suitable for storing large-scale small-sized objects and supporting high-performance and high-concurrency data-intensive applications.Each machine in the cluster usually has tens of billions of key-value pairs.Most of the data's access frequency changes from high to low over time and will not even be accessed later.Therefore,in order to reduce the storage cost,a large number of long-term unvisited keyvalue pairs need to be migrated to the lower-level storage cluster with cheaper storage media and higher compression ratio.According to the above requirements,an efficient data migration mechanism for keyvalue store is proposed.The accessed key-value pairs are fully sampled within the sampling period,and then the key-value pairs that have been accessed and not accessed are identified,and the key-value pairs that have not been accessed are migrated.The data sampling process uses a high-concurrency blocked Bloom Filter on the critical path of system access,which guarantees the high performance of multi-threaded sampling through atomic operations and CPU cache prefetching.By designing an algorithm that periodically transforms the Bloom Filter hash function,the impact of the Bloom Filter false positive rate is reduced while ensuring that the time space complexity is constant;And use the user-level Read-Copy Update mechanism to replace the global lock for protecting the global bloom filter,so that the multi-thread sampling to the Bloom filter and the background thread backup and update the Bloom filter can be performed completely concurrently.The data migration process first uses the LevelDB version mechanism to implement database snapshot backup between multiple processes,and then uses multiple threads to traverse database snapshot data in parallel,identify and bulk migrate key-value pairs,ensuring that the entire process is transparent to user access data;As the migrated data is marked with the migrated mark,handling write conflicts between tagging operations and user update operations through the version number.User can access the key-value pairs that are migrated to the lower-level storage cluster using the asynchronous access mechanism of the storage side.After several experiments,the designed migration mechanism can operate normally.The test results show that,in the high concurrent environment,the designed blocked Bloom Filter only increases the delay of 0.2 microseconds,and the user-level Read-Copy Update mechanism improves the read-side performance by two orders of magnitude compared to the global lock mechanism,indicating that the data sampling does not affect the system performance.Only 7.5% of the user access latency is added during the data migration process,and batch migration key-value pairs can achieve at least a 1x increase in throughput compared to migrating individual key-value pairs each time.
Keywords/Search Tags:Key-value store, Bloom Filter, Data migration
PDF Full Text Request
Related items