| With the rapid advancement of the Internet of Things(IoT)and artificial intelligence(AI),sensor-based smart devices,including smart home,smart transportation,and smart healthcare,are gaining increasing popularity in people’s daily lives.The data generated by these devices are of great value for training intelligent application models.However,the traditional cloud server-based centralized training approach is inadequate for training models on mobile devices due to constraints such as communication overhead,data security,and privacy regulations.As a result,distributed learning technology has attracted significant attention and research from scholars in recent years.Despite the potential advantages of distributed learning,the heterogeneity of device data can lead to significant inconsistencies in data distribution across devices.These inconsistencies,which can be caused by user groups,geographical associations,and other factors,can have a detrimental effect on the performance of distributed learning models.Therefore,studying the nonindependent,heterogeneous distribution characteristics of data,while also ensuring the protection of original data privacy,is critical for improving model performance and reducing communication overhead in artificial intelligence development.We aim to tackle the aforementioned issue by optimizing the performance of the distributed learning algorithm in the presence of data heterogeneity from two perspectives.Firstly,to address the challenge of low performance of the global shared model in the presence of data heterogeneity,we propose a novel distributed learning algorithm.This algorithm can hierarchically cluster devices based on the data distribution characteristics of each device and introduce an adaptive dataset condensation algorithm that can adapt to different devices.Secondly,to address the issue that a single shared model cannot fulfill the requirements of all clients,we further personalize the model on each device by utilizing the computed global model.This approach ensures that the model is better tailored to the tasks on each device.The specific research contents are outlined below.(1)To address the issue of low performance of the global shared model in the presence of data heterogeneity,we propose a distributed learning algorithm based on hierarchical clustering and adaptive dataset condensation.This algorithm trains the global model by collecting synthetic datasets computed locally by each client.Firstly,we propose a novel client-side hierarchical clustering algorithm and establish an entropy Topsis comprehensive evaluation model.This model scores and stratifies clients based on their data distribution characteristics,which enables us to adjust the subsequent adaptive dataset condensation process.Then,we introduce an adaptive dataset condensation algorithm that allows each client to train a mapping from the original dataset to the synthetic dataset,based on its position in the hierarchical structure and data distribution characteristics.Detailed theoretical analysis and experimental results demonstrate that the proposed algorithm can train a global model with better performance in the presence of data heterogeneity.Furthermore,the proposed algorithm achieves a good trade-off between prediction accuracy and communication overhead,as evidenced by ablation experiments.(2)To address the issue that the global model cannot meet the requirements of all clients,we propose a two-stage optimized personalized distributed learning algorithm.Personalization of the global model is carried out in two phases:pre-start and in-training.In the pre-start phase,we propose a client similarity measurement model based on the similarity matrix to cluster clients and train a model for each client cluster.This approach enables clients within the same cluster to collaborate and train a personalized model with better performance.In the in-training phase,we prevent the over-personalization problem by decoupling the neural network into a base layer and a personalization layer.We introduce a regularization term in the local objective function to personalize the model effectively.The experimental results demonstrate that the proposed algorithm can train a personalized model with higher average accuracy,and the accuracy variance of the model on each client is also better than that of existing personalization algorithms. |