In recent years,deep learning has achieved remarkable success in image analysis,natural language processing,speech recognition,video classification and other fields.However,deep learning depends on powerful computing power.Optimizing the system framework of deep learning to reduce the demand for computing power plays an important role in the application of deep learning.Many-core processor architecture is widely used in the field of deep learning.How to realize load balancing of computing tasks on many-core processor is a valuable problem.This thesis mainly studies how to effectively divide the computational graph corresponding to the model and realize load balancing when training the deep learning model on the many-core processor architecture.In this thesis,an automatic partition algorithm for many-core processors is designed.Firstly,the problem of graph partitioning is analyzed.The thesis clarifies that the ultimate goal of the graph partitioning algorithm is to shorten the running time of the deep learning model and three factors need to be considered:load balancing,communication cost and storage upper limit.It can be concluded that the amount of computation,storage and routing of subgraphs can affect the partitioning effect.Secondly,the graph partition problem is modeled by Markov decision process.The situation of sub-graph partition and core resource allocation of the current computation graph are extracted as State in Reinforcement Learning.The adjustment of layers between two adjacent cores is designed as Action.The running time and storage of computational graphs on many-core processors is designed as Reward.Thirdly,a graph partitioning algorithm is designed and implemented to solve the Markov Decision Process model according to the Policy-Based Reinforcement Learning algorithm.Finally,the results of automatic partitioning using the graph partitioning algorithm are compared with the expert-designed division results.The algorithm in this thesis can automatically divide the computational graph into sub-graphs,allocate core resources for each sub-graph,and optimize the running time of the deep learning model in many-core processors.In this thesis,a prototype system for automatic partition of computational graphs is implemented.Firstly,the requirements of the prototype system are analyzed.There are five functional requirements:uploading the model to be divided,constructing the computational graph,dividing the computational graph,saving the instant result,and viewing the division result.Secondly,the prototype system is designed and implemented.In the system implementation part,Relay IR(Intermediate Representation)is used.Relay IR is a widely used Intermediate Representation of Deep Learning Compiler,which can be transformed into computational graph.The part of the system that implements the construction of computational graph expands the support of Relay IR for Deep Learning framework Caffe so that Caffe model can be transformed into computational graph.The system also implements the function of generating calculation diagram by reading the information of computational graph.After the subgraph partition is completed by using the computational graph automatic partition algorithm,the partition results can be displayed to the user.Finally,through the testing and analysis of the system,it is proved that the system can automatically divide the Deep Learning computational graph and display the results. |