Font Size: a A A

Design And Implementation Of Tensorflow Distributed Model Training Platform On Kubernetes

Posted on:2019-06-08Degree:MasterType:Thesis
Country:ChinaCandidate:Y Y WangFull Text:PDF
GTID:2428330590974172Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In recent years,artificial intelligence has become more and more popular.Being one of the hottest topics,it has brought social changes and technical developments.Machine learning has quietly infiltrated into our daily life and the applications of machine learning have been implemented in various business areas.These applications get the benefits from the experience model which getting by training in a large amount of historical data.In order to get valuable contents and high quality products from artificial intelligence,we must start to study how to train models efficiently and quickly.This thesis will consider the speed of model training and solving some problems in the training process from the perspective of algorithm engineers to design and implement a usable distributed model training platform.The model training platform will provide a set of online services,such as dataset,model,script management tools and runtime environments.And focus on the distributed calculate by using the Tensorflow framework to reduce the time of model training,and greatly improves the efficiency of model development.The thesis "Design and implementation of Tensorflow distributed model training platform on Kubernetes" mainly introduces the model training API service and the Tensorflow Job execution engine.Each service is deployed as an independent service platform,and all platforms are organized through a micro service architecture.Among them,the main functions of the model training API service are the management of dataset,model,project,component and pipeline.The main functions of the Tensorflow Job execution engine are running or deleting a Job and search the detail information of a specific Job.The whole design and implementation of the above functions are based on the software engineering development process.The model training API is based on the B/S architecture,using the Java language and Spring Boot framework,and MYSQL is used to persist data.The Job execution engine is implemented in Python language and Django framework,and the Redis will be used to store the unstructured data.The thesis has achieved the expected results,and could provide Tensorflow model training services for algorithm engineers.
Keywords/Search Tags:Model training, Kubernetes, Tensorflow, B/S architecture, Microservice
PDF Full Text Request
Related items