Research And Implementation Of The Heterogeneous Tensor Flow Architecture

Posted on:2019-06-17

Degree:Master

Type:Thesis

Country:China

Candidate:G F Lin

Full Text:PDF

GTID:2348330542472647

Subject:Engineering

Abstract/Summary:

PDF Full Text Request

With the development of global scientific and technological change,artificial intelligence has become the focus of research and strategic focus of the major companies.Tensor Flow is an open source machine learning framework launched by Google.It has attracted much attention since its open source.It is one of the most popular machine learning and deep learning projects in Github community.At present,TensorFlow can be deployed on multiple cloud platforms,but there are still some software dependency and management problems,as Docker technology has the advantages of rapid deployment and portability,TensorFlow virtualization and heterogeneity can be realised by Docker container,so as to solve the problem of TensorFlow environment dependence,and provide a convenient development environment for researchers and engineers,which has practical application significance.Based on the comparative analysis of TensorFlow and other mainstream deep learning frameworks in the market,aiming at the shortcomings of TensorFlow in task scheduling and fault tolerance,this paper proposes a corresponding optimization scheme;as the advantages of Docker container,such as resource isolation,high performance,and portability,we use the Docker container to deploy Tensor Flow to realize the virtualization isomerism of TensorFlow;through the construction of TensorFlow deep learning system based on Docker cluster,the distributed deployment of the TensorFlow deep learning system is realized to enchance the data throughput of the platform;meanwhile the long time training problem of deep learning is solved by the multi GPU parallelization model training scheme.This paper has the following contents:(1)the research and improvement of TensorFlow architecture: We analyse mainstream learning frameworks in the market in different dimensions,elaborate task scheduling,fault tolerance and performance monitoring such problems in deep learning platform TensorFlow,and then put forward corresponding optimization and improvement schemes for these problems;(2)design Docker container cluster: By using Open cSwitch+GRE tunneling technology to achieve cross host communication between container clusters,we build a Docker container cluster on this basis,providing basic conditions for subsequent experiments;(3)implement the load balance of the Docker container cluster: Through the real-time performance monitoring data acquisition node and the host container,we realize the flexible flow of the whole container cluster design,according to user set threshold of relevant resources to limit trigger elastic telescopic mechanism,and using a scheduling policy based on resource usage to select hosts for application container extensions or destruction,so as to improve the efficiency and utilization of resource scheduling in the cluster;(4)design and implement of data parallel training program: based on the original data parallel program model that updates models via synchronous or asynchronous ways has low efficiency of parallel master-slave operation problem,this paper designs a parallel annular structure,compared with master-slave structure with higher GPU use efficiency;(5)By comparing the experimental results of isomerization TensorFlow design and experiments,it is proved that isomerization TensorFlow can effectively improve the utilization efficiency of computing resources and shorten the training time of deep learning,which has important practical application value.

Keywords/Search Tags:

TensorFlow, Docker, virtualization isomerism, elastic expansion, resource scheduling, performance monitoring, parallelization training

PDF Full Text Request

Related items

1	Research On Construction Of Elastic Cluster Based On Docker And Resource Pre-scheduling Strategy
2	Research And Implementation Of Resource Monitoring And Elastic Expansion Technology Of Cloud Platform
3	Optimizing Scheduling Of Data Parallelization On Deep Learning Framework Tensorflow
4	Design And Implementation Of A Docker Performance Monitoring System
5	Machine Learning Platform Improvements For Tensorflow Distributed Training And High Performance Inference
6	Design And Implementation Of High Concurrent Web System Architecture Based On Docker Container
7	Design And Implementation Of Docker-based Elastic Scheduling System For Live Cloud Platform
8	Research Of Docker Container Scheduling Optimization Method
9	The Research On Key Technology Of Virtualization Resource Monitoring And Scheduling In Cloud Computing Platform
10	Heterogeneous Disaster Recovery Center Based On Docker