| With the advantages of low latency and high bandwidth,in-memory storage systems can efficiently underpin new real-time applications such as artificial intelligence inference and e-commerce,but the storage capacity is hard to scale.High-density persistent memory(PM)and RDMA networks supporting remote direct access bring new opportunities for building high-capacity distributed in-memory storage systems.However,traditional storage systems isolate network and storage,which leads to conflicts between data sharing and concurrency efficiency,and between remote direct access and local management,thus hindering the full performance of hardware.In response to these problems,this dissertation introduces the idea of network-storage co-design,which lets distributed in-memory storage systems distribute data/metadata according to network characteristics and manage storage resources directly on the network path.Specifically,with network-storage co-design,this dissertation conducts research on general distributed techniques including distributed data organization structures and distributed protocols:·A scalable PM data organization structure for NUMA architecture is proposed.This dissertation shows for the first time that the limited PM write bandwidth makes the traditional idea of "reducing cross-NUMA accesses by consuming local bandwidth" invalid;then a new data organization structure is designed,which selectively distributes hot data into local PM and global DRAM to eliminate associated remote PM accesses while avoiding additional local bandwidth consumption.Experiments show that the structure improves the throughput by up to 2.3×.·A write-optimized data organization structure for RDMA networks is proposed.It is based on B+Tree and improves the overall write performance via multi-level collaborative optimization:it introduces a hierarchical lock that uses on-chip memory of RDMA NICs to accelerate concurrent accesses;it exploits in-order delivery property of RDMA to combine multiple network requests and thus reduce the number of network round trips;it tailors the data structure layout to reduce RDMA write amplification.Experiments show that the structure improves the throughput by one order of magnitude under typical write-intensive workloads.·A cache coherence protocol based on programmable switches is proposed.The protocol manages cache coherence on the network path of requests and offloads tasks such as serialization and multicast to the switch,thereby reducing distributed coordination overhead.A distributed shared memory(DSM)system is implemented using the protocol.The DSM system obtains 4.2×,2.3× and 2× speedup on keyvalue store,graph engine,and transaction processing workloads,respectively.·A PM replication protocol based on a new NIC primitive is proposed.First,this dissertation designs a new RDMA NIC primitive that manages the PM space in a one-sided manner,thus overcoming the conflict between remote direct access and local management faced by the traditional one-sided primitives.Then,using this primitive,a replication protocol based on the log-structured approach is designed and a corresponding distributed PM key-value storage system is built.Experiments show that the protocol can reduce latency by 2.11× and greatly eliminate devicelevel write amplification. |