The development of computing performance grows much faster than that of storage performance,exacerbating the "storage wall" problem under the traditional von Neumann architecture.Computational storage has become a research hotspot by offloading computing resources into the storage device,making computational process closer to the data and reducing data movements,which alleviates the "storage wall" problem.However,the existing research on computational storage is based on customized hardware and software co-platform,among which the hardware and software systems are highly inconsistent,lack common programming interfaces and runtime environments,and have poor scalability.The exploration of computational storage applications is limited by incompatible systems and diverse interface protocols.Computational Storage System Emulation Platform(CSSEP)is proposed to solve the problems of application scenario simplification,hardware customization,and privatized interface protocol.CSSEP changes the hardware dependence and limitation of the original computational storage system.It designs and implements the whole system emulation of computational storage under the general computer system architecture.CSSEP uses a computational storage operating system(CSOS)as a computing unit,making flash memory management and computing tasks independent of each other.With a flexible and scalable design space,CSSEP can not only adjust various parameters of the solid-state disk but also flexibly deploy typical computational storage applications on the disk.To enhance solidstate disk computing capability,CSSEP introduces graphics processing units instead of Field Programmable Gate Arrays(FPGA)to implement a software-defined hardware acceleration model.Bandwidth experimental results among CSSEP components reveal that,the in-disk read and write bandwidth of CSOS based on Storage Performance Development Kit(SPDK)reconstructed I/O stacks are 9.7~15.4 times and 4.1~6.7 times of the out-disk bandwidth respectively.Based on the CSSEP platform,computational storage applications such as compression,decompression,face set detection,and offline deduplication are designed and implemented.The experimental results show that,after being offloaded to CSOS,the application performance of decompression and offline deduplication represented as I/O tasks is improved by 116%~126% and 30.9%~52.6% respectively compared with the host.After being offloaded to CSOS using the accelerator,the application performance of compression and face set detection represented as computing tasks is improved by 2.11 times and 43.7 times respectively on average compared with the host. |