Font Size: a A A

Collaborative Near Data Processing Technology For Distributed Object Storage

Posted on:2021-08-14Degree:MasterType:Thesis
Country:ChinaCandidate:J Z DengFull Text:PDF
GTID:2518306104494564Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
In order to reduce overheads caused by data movement during data processing,Near Data Processing(NDP)proposes to process data near the place where data stored.The storage nodes in distributed object storage systems can be used not only for storing data,but also for near data processing.However,the existing NDP schemes for storage systems fail to make full use of the resources of a large number of storage nodes to meet the demand of near data processing.Firstly,a local near data processing scheme based on storage nodes is designed and implemented,and then the existing remote near data processing scheme is replicated.The limitations of these two schemes are presented through tests.On this basis,a Collaborative Near Data Processing(CNDP)scheme is proposed.The core idea of CNDP is that the task of near data processing is completed by multiple computing units.CNDP consists of five parts: proxy module,storage module,trigger,service module and executor.It can serve two types of requests at the same time: NDP requests and ordinary I/O requests.The data requested by the NDP will be processed by the corresponding NDP application after it is read from the storage module or before it is written to the storage module.The proxy module is responsible for receiving requests and returning responses,the storage module serves ordinary I/O requests,the trigger intercepts NDP requests,and determines the NDP application that performs data processing based on the metadata of the NDP request.The service module is responsible for scheduling NDP requests to the appropriate executor.NDP applications are function codes that are written and deployed in advance according to user needs.The executor based on the container engine Docker is composed of a NDP application and its running environment,and runs as containers on storage nodes.In addition to scheduling NDP requests,the service module based on the container scheduling tool Kubernetes also provides resource limitation and resource scaling for the executor,reducing the impact of near data processing on the performance of the storage system.CNDP and three other typical data processing scheme prototypes SDP,ZNDP and TDP are implemented,and they are tested and compared in three common data processing scenarios,namely data encryption,data compression and data decompression.The experiment results show that in the case of fewer CNDP nodes than TDP and ZNDP,CNDP request latency on performing encryption is 41.5% lower than SDP,while the average latency difference between CNDP and other solutions does not exceed 13.5%;when performing compression,CNDP request latency is 40% lower than SDP and 25% lower than TDP,and the performance gap between CNDP and ZNDP does not exceed 5.4%;when decompression is performed,CNDP request latency is 40% lower than SDP,TDP request latency is 29.8% lower than CNDP,and the performance gap between CNDP and ZNDP does not exceed 4%.With the same number of nodes,the request delay of CNDP is 18.3% lower than that of ZNDP when data encryption is performed.
Keywords/Search Tags:near data processing, distributed object storage, container, data movement
PDF Full Text Request
Related items