| MongoDB is a popular no-relational database that usually uses high-performance solid-state disks(SSDs)as the storage device.However,due to its unbalanced writing,the data on SSDs has the problem of data fragmentation.The application of multi-streamed SSDs can alleviate data fragmentation,but it is necessary to design a stream management strategy to generate the expected lifetime of the data to guide the multi-streamed SSDs to place data properly.However,existing manual stream management strategies adopt a filetype-based method,which is unable to separate the hot and cold data in the file.Meanwhile existing automatic stream management strategies adopt a method based on the logical block address of SSD,which cannot leverage the data type semantics to assign streams.In order to solve the write performance problem of MongoDB on SSD,a MongoDBbased Adaptive Stream Management Strategy(MAStream)for multi-streamed SSD is proposed.MAStream designs multiple levels of stream management strategies based on SPDK(Storage Performance Development Kit)user-mode I/O stack.The adaptive stream strategy is designed in the file system layer,which firstly assigns stream IDs for metadata files and journal files in file uints,then assigns stream IDs for data files by hot or cold halves,and finally assigns stream IDs for index files with random hot and cold distribution in the units of data chunk,thus achieving efficient and accurate stream assignment.The virtual stream secondary assignment strategy is designed in the general block layer,which provides an unlimited number of virtual streams for the adaptive stream strategy by introducing the concept of virtual stream.At the same time,a corresponding algorithm is designed to map the virtual streams with similar heat to the same physical stream.In the device driver layer,the physical stream management mechanism is designed based on NVMe Directive mechanism to obtain physical stream information and issue multi-stream writes.With multilevel stream management design,MAStream has good write performance.The prototype of MAStream is implemented based on open-source MongoDB and SPDK,and its performance is evaluated on the Huawei ES3600 P V5 multi-streamed SSD.The evaluation results show that MAStream achieved better performance and lower write amplification under the workload with high write ratio.Compared with the original MongoDB,the performance is improved by 14.1% to 24.4% and the write amplification is reduced by 11.1% to 18.4%.Compared with the manual stream management strategy Manual Stream,the performance improved by 8% to 16.5% and write amplification decreased by 8.3% to 14.9%.Compared with the automatic stream management strategy Auto Stream,the performance is improved by 4.1% to 8.1% and the write amplification is reduced by 5% to 8.8%. |