Font Size: a A A

Research And Design Of Artificial Intelligence Training Platform Based On Cloud Computing

Posted on:2020-12-20Degree:MasterType:Thesis
Country:ChinaCandidate:Y B LiuFull Text:PDF
GTID:2428330572976412Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology and big data in recent years,artificial intelligence has become more and more hot research area,and has achieved good development.However,researchers often encounter problems during the training of the model.For example,the computing resources include but are not limited to CPU,memory,graphics card,etc.,or the instability of the machine environment due to some human factors,Those problems lead to theinability of other researchers to use.Therefore,based on the above reasons,this paper attempts to propose a cloud computing-based artificial intelligence training platform.This paper first studies the current development of cloud computing,virtualization technology,containerization technology,container scheduling and other key technologies and the development status of artificial intelligence training platforms at home and abroad.Then,through in-depth analysis and analysis of the advantages and disadvantages of each container scheduling scheme,it is determined that this paper will use the most popular cloud computing combination Docker containerization technology+ container scheduling scheme Kubernetes to build artificial intelligence training platform.Then the paper analyzes the requirements and feasibility of the platform according to the current practical application scenarios,and proposes the overall architecture of the platform based on this.Finally,on the basis of completing the basic functions of the platform,the storage environment involved in the platform is studied,the Ceph distributed file system of the storage scheme is determined,and the file system is optimized.The main work of the paper is as follows:1)By analyzing the storage environment of the artificial intelligence training platform,comparing the performance of the NFS file system and the Ceph file system,this paper chooses the Ceph file system as the storage medium,and adjusts the performance of Ceph based on the optimization of the network layer.Excellent,after the experimental test,the transmission speed is 2.6 times of the original speed,and the iteration time of the artificial intelligence training model is greatly shortened.2)By analyzing the resource scheduling requirements of the artificial intelligence training platform,this paper extends the basic scheduling algorithm based on Kubernetes,proposes a new pre-selection strategy PodChoiceFitResources and the preferred strategy MaxResourceUsagePriority.And The practical application proves that the new algorithm not only can accurately dispatch the Pod to be scheduled to the destination node,but also can be used under the premise of constant resources.The planned increase in the number of tasks ensures that the overall task running time of the platform is unchanged or even shortened.3)Through the analysis of the operation and management functions of the artificial intelligence training platform,this paper is based on the two most mature open source solutions:Heapster+InfluxDB+Grafana and Prometheus propose a new management combination:Prometheus+Grafana,to use the platform resources.Monitoring and necessary resource alerts.4)Build and implement an artificial intelligence training platform.Through the experimental detection of the actual functions of the platform,such as image construction,resource application,container establishment,etc.,it is verified that the platform can allocate resources reasonably and efficiently.
Keywords/Search Tags:artificial intelligence, ceph, kubernetes
PDF Full Text Request
Related items