| An increasing number of edge devices are connected to the Internet,generating high volumes of data at the network edge.However,due to high communication costs and privacy leakage concerns,it’s impractical to transmit a tremendous amount of data from the data sources to the cloud for building AI models.To address these challenges,distributed edge learning has been proposed to deploy distributed machine learning models on network edges.The major model training framework for distributed edge learning is centralized structures,which have drawbacks such as communication bottlenecks,poor scalability,and single points of failure.Thus,existing works designed hierarchical model training frameworks to solve the drawbacks,which build a tree-based aggregation structure based on network topology and data distributions.However,because the key characteristics of an edge computing environment are resource heterogeneity,the classical parallel method fully synchronous SGD is not suitable for heterogeneous edge environments in the hierarchical model aggregation structure.Heterogeneity in edge resources will lead to the fact that devices with fast training speed need to wait for slow devices,resulting in a waste of resources.In addition,resources in the edge environment are dynamic,and offline aggregation frequency optimizations methods cannot adapt to the real-time changing environment.This paper focuses on optimizing model aggregation frequencies in the hierarchical model aggregation structure by considering the heterogeneous and dynamic changes of resources in an edge environment.The main work is as follows:(1)Due to the heterogeneity in edge resources,we propose weak synchronization where edge devices on the same level have different frequencies on local updates and/or model aggregations.The purpose of weak synchronization is to improve the resource utilization of edge devices,the convergence speed of the model,and the final training accuracy.At the same time,we design a resource-based aggregation frequency controlling method termed RAF,which determines the optimal aggregation frequencies of edge devices to minimize the loss function according to heterogeneous resources.Our proposed method can alleviate the waiting time and fully utilize the resources of edge devices.However,if the difference in the aggregation frequencies among each edge device is large,the RAF method can result in inferior errorconvergence.Therefore,we adjust the aggregation frequencies dynamically at different phases during the model training to tackle this problem.We evaluated the performance of RAF via extensive experiments,and the evaluation results demonstrate that RAF outperforms the benchmark approaches in terms of learning accuracy and convergence speed.(2)In view of the dynamic changes of resources in the edge environment,we present an online optimization method for model aggregation structure and aggregation frequency termed Auto SF.The edge environment changes periodically during model training.When the edge environment changes,we adopt an evaluator that combines a resource model with an ML-based predictor and an optimizer that uses random search to automatically control the aggregation structure and aggregation frequency termed Auto Control.The Auto Control can fit well with the changing network topology and bandwidth resources.To alleviate the data non-IID problem,when the edge environment does not change,we design a heuristic algorithm based on the results observed in model training to adjust the aggregation structure and aggregation frequency termed Adjust Cluster.The extensive results show that the Auto SF proposed in this paper significantly improves convergence speed and accuracy compared to the benchmark algorithm. |