Font Size: a A A

Concurrent Hash Tables On Multi-core Systems

Posted on:2019-07-22Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z W ChenFull Text:PDF
GTID:1318330542472268Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
As the development of multi-processor techniques and the increasing of the number of cores integrated into a single CPU chip the bottleneck of computing power of the processor is constantly being broken.It also makes designing highly scalable concurrency data structures and developing high-performance software systems based on multi-core architectures more complicated.Concurrent hash table(CHT)is an important concurrent data structure and it is widely used to implement software systems on multi-core architectures as its queries run in amortized constant time.The hardware architecture,cache coherence protocol,memory bandwidth,memory access latency and multi-thread synchronization mechanism impact the designing,optimization and application of CHT significantly.The works of this paper in-cluding:First,we present a framework,named CHTBench,which provides a fair testing en-vironment and unified interface for the experiments by hiding the discrepancies of hard-ware platforms,synthesized workloads,concurrency models,and compiler configurations.In this way,we can guarantee the experimental results generated from our framework are fairly comparable between different CHTs.An open source library,sspfd,is integrated into CHTBench to measure the access latency of different kinds of operations.With CHTBench it's easy to compare the performance of CHTs.And it can also combine with other sys-tem tools such as likwid to measure cache hit/miss rate,memory bandwidth overheads and cross-socket node communication overheads from micro perspective.Second,we dissected 5 state of the art CHTs on CHTBench.The evaluations are explored from a wide range of perspectives including thread scalability,throughput,la-tency,memory hierarchy impact,low-level synchronization primitives,and memory us-age.The inter-correlations between relevant metrics are also discussed when necessary.The experiments are conducted on four major hardware platforms including Intel Many In-tegrated Core(MIC)architecture and three representative Non-uniform Memory Architec-ture(NUMA)systems.We ported CHTs to the MIC platform,and to our knowledge,this is the first extensive study of concurrent hash tables on Intel MIC architecture.According to the experimental results,(Eeight principles of best practices in designing and optimizing concurrent hash tables are summarized which lay the theoretical and practical foundation for further design of high scalability concurrent hash tables based on multi-core systems in the future.Then,inspired by the fine-grained Cache Line Hash Table,we implemented a concur-rent cache-line hash table with hardware transactional memory(HTM-CLHT).The size of a HTM-CLHT bucket is padding to the size of cache line(64 bytes in our testbed).The HTM-CLHT takes the whole hash table as the critical section which can provides optimistic concurrenty control by allowing threads to run in parallel with minimal interference.When running workloads which large than the capacity of LLC,the performance of HTM-CLHT is twenty percent better than using traditional fine-grained locks.HTM-CLHT achieves the goal that using a simple,coarse-grained locking method,obtaining high performance which matched with sophisticated synchronization methods such as fine-grained locking and non-blocking.In order to eliminate the Lemming effect of Intel TSX,we presented two software-assisted techniques,lock removal(SLR)and conflict management(SCM).Both of these methods improve the performance of the HTM-based concurrent hash table more sig-nificantly than the RTM Retry mechanism recommended by Intel.At last,we presented a concurrent Cuckoo Filter based on partial-key Cuckoo Hash-ing.To our knowledge,this is the only concurrent filter so far,and its also the only one which based on hardware transactional memory(HTM).The standard Bloom Filter do not support delete elements from filter,while other extended versions of Bloom Filter support the deletion of filters with high space overheads.The structure of Cuckoo Hashing is in a set-associative way,each element can map to several slots.Taking this feature of Cuckoo hash table,we extract the fingerprints of the members of a set and store them in a hash table.And items with same fingerprint are fine,i.e.the filter can store several identical fin-gerprint.To delete an item from Cuckoo Filter,the fingerprint of this item is deleted from filter.If there is another item has the same fingerprint,we can still find this item with a copy of this fingerprint.This can delete an element very easy and straightforward without any additional space overheads.According to our experiment results,when running with read-only workloads,the max throughput is 38 times of the throughput of a single thread,and running workloads with ten percent update operations,the max throughput is 11 times of the throughput running workload with a single thread.
Keywords/Search Tags:Multi-core System, Cache Coherence, Concurrent Hash Table, Synchronization, Non-uniform Memory Architecture, Hardware Transactional Memory
PDF Full Text Request
Related items