Font Size: a A A

Multi-granularity Parallel Optimization Of Convolution Neural Networks

Posted on:2018-04-03Degree:MasterType:Thesis
Country:ChinaCandidate:J ChenFull Text:PDF
GTID:2428330515489685Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Convolution neural network is a multi-stage global training neural network model.In recent years,it has been widely used in image recognition,natural language processing and other related fields.It has the advantages of simple model and high recognition efficiency.When it uses serial method to train,there is a long training time,poor flexibility of the shortcomings.With the complexity of solving the problem,it needs a larger and more complex training data and the convolution neural network implemented by the serial method is less efficient when dealing with massive data.The research work of this paper is mainly a multi-granularity parallel optimization strategy for convolution neural networks.First of all,this paper introduces a method of parallel acceleration based on distributed clustering and GPU architecture.Then,the parallel-optimized convolution network is trained by the parallel optimization structure.Finally,the experimental results show that the structure has good parallel effect.Compared with the convolution neural network of stand-alone serial training and the common cluster parallel architecture,this paper achieves a good acceleration ratio effect,while improving the parallel efficiency.(1)Distributed clustering is used to perform coarse-grained parallel computing.Using Mapreduce to divide the entire training set into several small data blocks,which are saved to each node.Each node holds the same convolution neural network model,which is trained in data-parallel method using pre-divided data blocks.Map task is responsible for forward propagation calculation and back propagation calculation and the result of the calculation is the local change of each weight and offset.Reduce task summarizes the localized values of each weight and offset,resulting in a global change.After several iterations,the training of the convolution neural network is completed.(2)In this paper,multi-granularity parallel optimization framework based on cluster and GPU is adopted.When performing coarse-grained network-intensive computing,it takes a distributed clustering approach to parallel computing which can break down large-scale tasks and disperse the data set.When performing fine-grained computationally intensive parallel computing,this paper uses GPU general-purpose computing in parallel.(3)It analyzes the communication overhead of task scheduling,load balancing and data transmission,and puts forward the corresponding optimization strategy.In the process of task scheduling,this paper manages the different operational tasks through the task scheduling thread pool which reduces the creation cost and the destruction frequency of the thread,and improves the parallel efficiency.The load balancer monitors the task execution in the system,monitors the remaining computing power in the node,and the task queue that submits the request,and then makes the decision.Then the load balancer will command the task scheduling thread pool to perform scheduling tasks,to achieve the effect of load balancing.Aiming at the overhead of transmitting communication time in the process of data transmission,this paper adopts the method of computing time to cover the data transmission time to reduce the influence of data transmission time on training time.
Keywords/Search Tags:Parallelization, Convolution neural network, Load balancing, Scheduling optimization, Communication overhead optimization
PDF Full Text Request
Related items