| Deep learning has been widely used in image processing,speech recognition,natural language processing and other fields.However,as deep learning methods have higher and higher requirements for training speed and data processing capabilities,traditional stand-alone-based training methods can no longer meet the needs.Therefore,distributed deep learning training methods have become an effective way to improve computing power,especially in scenarios where large-scale data needs to be processed.This method can speed up the training speed and improve the accuracy of the model by training the deep learning model in parallel on multiple machines.At the same time,the rise of large-scale distributed Internet of Things(Io T)applications has spawned a vigorous demand for network models such as deep neural networks(DNN)for training and reasoning at the edge.Limited by the central data transmission mechanism,the heterogeneity of edge devices,and resource constraints,the existing single dataparallel and model-parallel distributed training mechanism often cannot make full use of edge device computing power and network topology.And bandwidth resources,etc.,even cannot be directly applied to the edge.The distributed training mode at the edge has high requirements for computing resources,reasonable division of the training network,and data distribution quality.Using the computing power and I/O capabilities of edge devices to construct distributed training is one of the research hotspots.To this end,this paper focuses on the above-mentioned problems of edge distributed training,and conducts research on distributed training related technologies in scenarios of resource constraints,environmental dynamics,and data unevenness.The main work is as follows:(1)This paper proposes Edge Mesh,an edge-end hybrid distributed training framework,which is based on Tensor Flow and Mesh-Tensor Flow,and is a large-scale,widely adaptable edge-end distributed training framework.Edge Mesh integrates the two modes of data parallelism and model parallelism,and has fine-grained control over convolutional layer operations and parameter transfer.It is a parameter server architecture for model parallelism among multiple devices;The parameter server is added between the clusters,so as to ensure that the accuracy of the model is not affected,it not only solves the resource limitation problem of DNN training on the edge device,but also accelerates the model training process.Experiments show that compared with stand-alone training and data parallel mode,Edge Mesh,an edge-side hybrid distributed training frame,can significantly reduce the average memory consumption while ensuring equivalent model accuracy.(2)This paper proposes an edge-side hybrid training grid adaptive strategy and a modelparallel convolution filter division algorithm to accelerate edge-side hybrid distributed training.The hybrid training grid at the edge divides a layout suitable for the hybrid distributed training framework according to the difference in the floating-point computing capabilities of the edge devices,thereby reducing the problem of large training delays caused by performance differences between clusters.The model-parallel convolution filter division algorithm is a model-parallel dynamic load balancing algorithm that takes into account the operating environment bandwidth,device floating-point computing power,memory and network delay.It aims to divide convolution filters suitable for the current edge environment.Less synchronization latency.The edge-side grid adaptation strategy and model parallel convolution filter division algorithm are deployed in the edge-side hybrid distributed training frame Edge Mesh.Compared with the benchmark algorithm,Het-Edge Mesh not only effectively reduces the training delay on heterogeneous edge devices,it speeds up the DNN model training process,and at the same time has high scalability.Experiments show that when compared to single-machine training and data parallel mode,Edge Mesh dis-tributed training mechanism can reduce the average delay by 3.2 times and average memory overhead by up to 43% while maintaining model accuracy. |