| Remote Direct Memory Access(RDMA)technology provides new possibilities for building high-speed distributed storage systems.However,the communication characteristics of RDMA are different from traditional Ethernet,which prompts distributed systems to reconsider the thread architecture of the communication module and the way of data transmission.The overhead of thread switching and lock contention during the transmission process of the existing distributed block storage system is huge.To solve this problem,a penetrating communication model is proposed.One thread runs through the entire RDMA transmission cycle to avoid the overhead of thread switching and lock contention during multi-threaded communication.At present,most distributed storage systems only use part of the RDMA transmission mechanism for data transmission.The transmission process is complex and cannot have good performance when transmitting data of different sizes.Based on the analysis of the applicable scenarios of different RDMA transmission mechanisms,a readwrite conversion transmission strategy is proposed.The READ and WRITE primitives are used for the write and read processes respectively to optimize the transmission.The traditional method of obtaining RDMA completion notification cannot take into account low CPU usage and low acquisition latency.To solve this problem,dynamic counting polling is proposed.After obtaining the interrupt notification,the polling mode is maintained through the counter,and the upper limit of the count is dynamically adjusted through the past counting records,so as to obtain the completion notification efficiently and avoid long-term occupation of the CPU.In addition,a contention-free memory management scheme is proposed for the registration memory contention and extra memory copy problems in the transmission process of the distributed block storage system.The independent RDMA memory pool is used to avoid memory contention,and the transmission path is optimized to avoid data copy between registration memory and ordinary memory.The above RDMA communication optimization technology is applied to the selfdeveloped distributed block storage system Flame.The test shows that the latency of the penetrating communication model is reduced by 44.96%~79.58% compared with the Ceph communication model;the read-write conversion transmission strategy is compared with that of Octopus+ transmission strategy and Ceph transmission strategy,the latency is reduced by 6.42%~42.43% and 9.04%~30.15% respectively,the bandwidth is increased by5.90%~105.39% and 1.90%~46.16% respectively,and the IOPS is increased by1.54%~104.90% and 1.90%~45.65% respectively;The CPU usage of dynamic counting polling is 40.40% lower than that of busy polling,and the acquisition latency is 36.17%lower than that of interrupts;the test compares the contention-free memory management scheme before and after optimization,the latency is reduced by 13.93%~19.61%,and the bandwidth is increased by 13.93%~14.01%,IOPS increased by 32.49%~33.64%.The overall test shows that,compared with the distributed block storage systems Ceph and Sheepdog,the bandwidth of Flame is increased by 17.61%~59.54% and 655.81%~904.75%respectively,and the IOPS is increased by 57.80%~487.70% and 85.04%~394.22% respectively. |