Font Size: a A A

Efficient Deep Reinforcement Learning Algorithm For Cyber-physical System

Posted on:2020-01-20Degree:MasterType:Thesis
Country:ChinaCandidate:Q M ZouFull Text:PDF
GTID:2428330590473211Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Reinforcement learning algorithm is an important branch of machine learning.It mainly studies how to make agents learn control strategy to accomplish specific tasks from the interaction information with the external environment.Because reinforcement learning algorithms require models to represent highly complex strategies,deep reinforcement learning algorithms using highly expressive deep neural networks as strategy representation function have gradually become more and more popular.Although the large parameter space of deep neural network makes it possible for deep reinforcement learning algorithm to master highly complex skills,it also makes the learning process require a lot of interaction information to achieve good learning performance.Sampling inefficiency problem of deep reinforcement learning algorithm is particularly serious in cyber-physical system.In cyber-physical system,the interaction speed between agent and environment is really slow.At the same time,in the training process,the sub-optimal control strategy output random actions which may damage the agent or external environment.Therefore,interaction information in cyber-physical system is really expensive.This paper mainly studies how to improve the sampling efficiency of deep reinforcement learning algorithm,and then reduce the high dependence on interaction information.This paper combines the deep reinforcement learning algorithm with the traditional optimal control theory to improve the sampling efficiency of the deep reinforcement learning algorithm,and avoid the limitations of traditional methods.Specifically,this paper tries to obtain a sampling efficient deep reinforcement learning algorithm from the two ideas of new initialization strategy and target task decomposition.In the first work,we propose an initialization strategy(MPC).MPC is essentially a constrained optimization problem and can be understood as an implicit strategy.In this paper,we use the multiparameter programming method to transform the MPC into a fully equivalent piecewise linear function.Such an operation is equivalent to transforming the MPC into an explicit and parameterized strategy.Further,we transform the piecewise linear function into a deep neural network.Unlike supervised learning,which is widely used in imitation learning,the weights of neural networks can be directly assigned by the proposed method.Based on this initialization method,the existing deep reinforcement learning algorithm can be directly fine-tuned on the basis of the control performance of model predictive control.Since the initial performance of the neural network is identical with that of the MPC,the proposed initialization strategy can help the agent search near the good initial solution and greatly improve the convergence performance and sampling efficiency of the algorithm.In the second work,we combine the initialization strategy with target task decomposition.Although the initialization strategy can improve the sampling efficiency,its performance is dependent on the original MPC.Previous studies have shown that,in one application,MPC can not perform well under all the situations.Therefore,an adaptive sub-task generation module is constructed in this paper,that is,reinforcement learning algorithm does not directly require the agent to master complex skills,but considers the performance of MPC and decomposes the target task into a sequence of sub-tasks from easy to difficult.Specifically,the sub-task generation module will choose the initial subtask which is achievable for the original MPC.Then,with the improvement of strategy performance,the difficulty of sub-task will be improved continuously,and finally help the agent learn to complete complex target task.In order to verify the effectiveness of the algorithm,we tested the algorithm on different simulation platforms.In addition to the commonly used reinforcement learning test environment OpenAI Gym,we also use simulation software to build different test environments,such as quadcopter,urban traffic network and so on.The experimental results show that the initialization strategy can help the reinforcement learning algorithm converge quickly to a better local optimal solution.At the same time,by combining with the adaptive sub-task generation module,the initialization strategy can help the reinforcement learning algorithm achieve good sampling efficiency and convergence performance in different test environments.
Keywords/Search Tags:Deep reinforcement learning, sampling efficiency, model predictive control, cyber-physical system
PDF Full Text Request
Related items