Network Acceleration Architecture Designed For AI Applications

Posted on:2020-01-01

Degree:Master

Type:Thesis

Country:China

Candidate:S B Qiu

Full Text:PDF

GTID:2428330602450337

Subject:Communication and Information System

Abstract/Summary:

PDF Full Text Request

Artificial Intelligence Technology has achieved unprecedented rapid development in recent years,which has been played indispensable roles in many fields.As the arrival of the age of artificial intelligence,more complicated machine learning models come into being for the purpose of solving many problems such as massive training data and high complexity.Large-scale machine learning models typically means higher accuracy as well as stronger expression ability,which can help people solve various delicate problems.However,large-scale machine learning models will inevitably incur both computing power and storage challenges.High model computational complexity will lead to unacceptable duration consumed by single training.What's more,the large scale of machine learning model may result in the training demands that can be hardly satisfied by storage of single machine.Therefore,it is necessary to adopt distributed machine learning clusters to finish the tasks of training.Different parallelizing techniques,cluster architectures and communication mechanisms all have great influences on performances of distributed machine learning clusters,hence how to divide,store and train the data and models in the distributed machine learning cluster has become the main problem faced by distributed machine learning.Parallelism techniques adopting in distributed machine learning vary from different models,and large-scale models that cannot be stored by single machine can only adopt model parallelism.However,the relatively slow training speed of existing distributed machine learning models and large scale of model parameters remain the main challenges in this field.Aiming at these two challenges,this paper minutely analyses two machine learning model segmentation methods in model parallelism and their corresponding traffic characteristics,which are layer-based segmentation and cross-layer segmentation respectively.Based on these traffic characteristics,this paper proposes a segmentation optimizing strategy named MPOS,which optimizes computing and communications capabilities between sub-models after model segmentation,according to the communication characteristics of different layers and composition relationship between model layers,attempting to accelerate the training speed of machine learning model on distributed cluster platform.When deployed in the actual platform,according to test results,the training time of the model segmentation using MPOS strategy is 15% lower than that of the general model segmentation method.In order to further accelerate machine learning model training speed on distributed cluster platform,this paper analyzes traffic characteristics in different parallelizing techniques during machine learning model training and accordingly designs a distributed cluster network architecture applicable to AI applications,called ECube.ECube effectively combines traffic characteristics of AI application and communication characteristics between different working nodes in the machine learning model training,delivering significant performance enhancements in data parallelism and model parallelism,comparing with typical Fat-Tree and BCube.According to the simulation comparison and analysis,the machine learning model training time of ECube architecture is about 40% lower than that of Fat-Tree architecture in these two parallelism technologies,which realizes the acceleration of model training and reduces the training time of distributed machine learning.

Keywords/Search Tags:

artificial intelligence, distributed cluster, parallelization, network architecture, accelerate

PDF Full Text Request

Related items

1	Design And Imp Lementation Of AI Acceleration System Based On Network-on-Chip
2	The Research Methods And Enlightenment Of Embodied Artificial Intelligence
3	Research On The Intelligent Development Of Digital Games Under The Influence Of Artificial Intelligence
4	Multi-Agent Based Approaches To Solving Distributed Constraint Optimization Problems
5	Research And Application Of The Artificial Intelligence Methods In The Field Of Valuation
6	Research On Hybrid Swarm Intelligence Parallel Algorithm Based On Multi-core Cluster
7	Performance Optimization Of Distributed Machine Learning Cluster System
8	Animation Simulation Of Artificial Fish Cluster Behavior In Virtual Ocean Environment
9	Research Of Evolution About The Concept Of Intelligence In Artificial Intelligence
10	The Research And Implementation Of The Parallelization Of The Clustering Algorithm In Cluster Environment