Font Size: a A A

Network Acceleration Architecture Designed For AI Applications

Posted on:2020-01-01Degree:MasterType:Thesis
Country:ChinaCandidate:S B QiuFull Text:PDF
GTID:2428330602450337Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
Artificial Intelligence Technology has achieved unprecedented rapid development in recent years,which has been played indispensable roles in many fields.As the arrival of the age of artificial intelligence,more complicated machine learning models come into being for the purpose of solving many problems such as massive training data and high complexity.Large-scale machine learning models typically means higher accuracy as well as stronger expression ability,which can help people solve various delicate problems.However,large-scale machine learning models will inevitably incur both computing power and storage challenges.High model computational complexity will lead to unacceptable duration consumed by single training.What's more,the large scale of machine learning model may result in the training demands that can be hardly satisfied by storage of single machine.Therefore,it is necessary to adopt distributed machine learning clusters to finish the tasks of training.Different parallelizing techniques,cluster architectures and communication mechanisms all have great influences on performances of distributed machine learning clusters,hence how to divide,store and train the data and models in the distributed machine learning cluster has become the main problem faced by distributed machine learning.Parallelism techniques adopting in distributed machine learning vary from different models,and large-scale models that cannot be stored by single machine can only adopt model parallelism.However,the relatively slow training speed of existing distributed machine learning models and large scale of model parameters remain the main challenges in this field.Aiming at these two challenges,this paper minutely analyses two machine learning model segmentation methods in model parallelism and their corresponding traffic characteristics,which are layer-based segmentation and cross-layer segmentation respectively.Based on these traffic characteristics,this paper proposes a segmentation optimizing strategy named MPOS,which optimizes computing and communications capabilities between sub-models after model segmentation,according to the communication characteristics of different layers and composition relationship between model layers,attempting to accelerate the training speed of machine learning model on distributed cluster platform.When deployed in the actual platform,according to test results,the training time of the model segmentation using MPOS strategy is 15% lower than that of the general model segmentation method.In order to further accelerate machine learning model training speed on distributed cluster platform,this paper analyzes traffic characteristics in different parallelizing techniques during machine learning model training and accordingly designs a distributed cluster network architecture applicable to AI applications,called ECube.ECube effectively combines traffic characteristics of AI application and communication characteristics between different working nodes in the machine learning model training,delivering significant performance enhancements in data parallelism and model parallelism,comparing with typical Fat-Tree and BCube.According to the simulation comparison and analysis,the machine learning model training time of ECube architecture is about 40% lower than that of Fat-Tree architecture in these two parallelism technologies,which realizes the acceleration of model training and reduces the training time of distributed machine learning.
Keywords/Search Tags:artificial intelligence, distributed cluster, parallelization, network architecture, accelerate
PDF Full Text Request
Related items