| In the age of big data,people realize the value of data and use parallel computing to mine data.Note that parallel computing is always based on a dynamic network,also called the dynamic parallel environment.In a dynamic parallel environment,data mining problems are split into multiple tasks,calculated simultaneously using distributed nodes connected over the network.When the task and its data are not on the same node,the task needs to obtain data across the network.The data transmission will reduce the parallel acceleration,and it is necessary to optimize the data transmission in dynamic parallel environments.Task cooperative scheduling is an effective way to shorten data transmission time.Instead of a single task,this method deals with the cooperative tasks once in a scheduling time.In this way,the optimization of data transmission also considers the overall parallel performance.This article also uses an application to represent a set of cooperative tasks,and the makespan of application represents the overall performance of parallel computing.In other words,the task cooperatice scheduling can reduce the makespan of an application by optimizing data transmission.However,current methods are suitable for an individual or simple application but not for more complex applications.Therefore,this paper studies the design of task cooperative scheduling methods for three complex scenarios,including multiple applicationa,synchronous applications,and the special Deep Neural Network(DNN)application.The contributions are as follows:(1)This paper reduces the multi-application makespan.In the cloud,an application includes a set of computation-dependency parallel tasks,and its makespan is affected by the data transmission time of tasks.One assignment scheduling rule,moving the task closer to the data,can shorten the data transmission time and the makespan.However,the previous methods are designed for a single application and fail to reduce multi-application makespan.Because it is unknown which application the task belongs to,the number of optimized tasks is unbalanced among multiple applications,which causes a big gap between the makespan of multiple applications.To address this issue,this paper proposes a task cooperative assignment scheduling that considers the application-task relationship,whose goal is to reduce the multi-application makespan.Firstly,the author proposes a cost sharing game model to guide the balance of optimized task assignment between multiple applications,in which the cost is referred to as the makespan.Specifically,the author predicts the future network service quality,which is used to accurately estimate the makespan with a given task co-assignment.Then,the author improves a graphbased relaxation algorithm for adjusting task assignments in a short time.In the end,the author evaluates the task cooperative assignment in the actual Hadoop cluster with diverse network environments.The experimental results demonstrate that the multi-application makespan decreases.Moreover,compared to the baseline schedulers,the longest makespan decreases by 61.5%,and the size of the data transfer is saved by more than 50%.(2)This paper accelerates the DNN distributed training,where tasks belong to multiple applications and multiple applications need to be synchronized,i.e.,synchronous applications appear.This is a new field that has not yet been studied.In DNN distributed training,cooperative tasks use faster Graphics Processing Units(GPUs),and thus the data transmission causes a more serious delay for the overall training.What’s more,as the training iterates,the synchronization gap between applications accumulates and brings huge delays to training.In short,DNN distributed training is highly sensitive to the multiapplication synchronization gap.For this new field,this paper is the first to use task cooperative assignment scheduling methods to reduce training delays.Secondly,this paper proposes a fine-grained perception of the multi-application synchronization gap.By accurately perceiving the gap,the different task data transmission time is matched for the asynchronous applications,so that the gap of multi-application makespan is reduced,and the training is accelerated.In addition,the DNN distributed training based on MapReduce is implemented,and experimental verification is carried out on this basis.The experimental results show that this method reduces the training time by at least 50%.(3)This paper makes the parallel training of Capsule Network(CapsNet)feasible.CapsNet is a special DNN structure with a unique capsule component and an unsupervised clustering operation for computing capsules.Due to its special structure,the parallel acceleration time is limited and covered by data transmission time.In this way,CapsNet parallel training is slower than standalone training,and this situation will not be optimized by task cooperative assignment.Therefore,this paper studies another task cooperative scheduling method,how to partition tasks,to reduce the data transmission for CapsNet.Compared with the traditional partition method,the method proposed in this paper is small and precise.First,only time-consuming tasks in capsule layers are partitioned,and thus we save a massive amount of unnecessary data transmission.Secondly,there is a precise selection of partition tasks,reducing the data transmission scale without the parallel acceleration.The experimental results prove that only the redesigned partition method achieves acceleration indeed,and the maximum speed-up is 1.7x the stand-alone training time.At the same time,our parallel strategy and limited computing power devices matches well,which can increase the usage for edge nodes in heterogeneous computing power network. |