Machine learning training often requires massive amounts of data.The richness and diversity of data largely determine the quality of the model.The explosive growth of data brought about by the era of big data has promoted the rapid development of distributed training in machine learning,but the current discussions are mostly focused on high-performance computer cluster environments.However,in the actual production environment,these valuable data resources have ownership.The major companies and organizations that have data consider privacy,benefits,and other factors,which makes it difficult to concentrate the data before performing cluster training.So this paper designs a machine learning remote parallel training algorithm to avoid data migration across organizations.Compared with the high-performance computer cluster environment,the design of machine learning training algorithms in cross-organization remote collaboration environments also needs to focus on solving technical problems such as uneven data distribution and differences in the capabilities of various training nodes.And because the inter-organizational relationship peers do not want to introduce third parties to interfere,the traditional central star organization structure will also be invalid.First of all,in view of the above problems,this paper proposes a clustered centerless mixed gradient descent method.1.This paper analyzes the organizational characteristics of cross-organizational computing nodes,and proposes a clustered and centerless collaborative model.In addition to the inherent advantages of the traditional parameter server model,the model can also consider the main contradictions of the same organization and different organizations separately.The structure of the center is more in line with the needs of the relationship between organizations.2.A distributed gradient descent training algorithm based on the collaborative model is proposed.The algorithm adopts different synchronization methods according to the different collaborative characteristics between clusters and within clusters.In particular,inter-cluster collaboration uses the non-central limited asynchronous protocol proposed in this paper,which can achieve limited asynchronous without the centralized management of the global parameter server.In order to improve the effect of clustered centerless mixed gradient descent method,this paper further proposes an optimized solution to the problem of uneven load within clusters and communication between clusters.1.In order to improve the execution efficiency of the intra-cluster training and avoid slow nodes being affected by the synchronization barrier to reduce the system throughput,a load balancing algorithm for the intra-cluster system is proposed.Compared with the general load balancing algorithm,the algorithm The degree of trust is taken into consideration,and there is a compromise between security and efficiency.2.Considering that inter-cluster communication needs to cross third-party networks,in order to improve the security and efficiency of inter-cluster communication,according to the characteristics of inter-cluster update information,an inter-cluster communication optimization protocol is proposed.Compared with general broadcast communication,this protocol Can effectively reduce the communication between clusters required for training.The effectiveness of the above methods have been verified through theoretical analysis and simulation,providing a set of safe,stable,and convergence-guaranteed solutions for distributed machine learning training across organizations.This solution avoids the migration of original data,eliminates concerns between organizations,and communicates "data islands",which can mine data value to a greater extent. |