Efficient Deep Reinforcement Learning Algorithm For Cyber-physical System

Posted on:2020-01-20

Degree:Master

Type:Thesis

Country:China

Candidate:Q M Zou

Full Text:PDF

GTID:2428330590473211

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Reinforcement learning algorithm is an important branch of machine learning.It mainly studies how to make agents learn control strategy to accomplish specific tasks from the interaction information with the external environment.Because reinforcement learning algorithms require models to represent highly complex strategies,deep reinforcement learning algorithms using highly expressive deep neural networks as strategy representation function have gradually become more and more popular.Although the large parameter space of deep neural network makes it possible for deep reinforcement learning algorithm to master highly complex skills,it also makes the learning process require a lot of interaction information to achieve good learning performance.Sampling inefficiency problem of deep reinforcement learning algorithm is particularly serious in cyber-physical system.In cyber-physical system,the interaction speed between agent and environment is really slow.At the same time,in the training process,the sub-optimal control strategy output random actions which may damage the agent or external environment.Therefore,interaction information in cyber-physical system is really expensive.This paper mainly studies how to improve the sampling efficiency of deep reinforcement learning algorithm,and then reduce the high dependence on interaction information.This paper combines the deep reinforcement learning algorithm with the traditional optimal control theory to improve the sampling efficiency of the deep reinforcement learning algorithm,and avoid the limitations of traditional methods.Specifically,this paper tries to obtain a sampling efficient deep reinforcement learning algorithm from the two ideas of new initialization strategy and target task decomposition.In the first work,we propose an initialization strategy(MPC).MPC is essentially a constrained optimization problem and can be understood as an implicit strategy.In this paper,we use the multiparameter programming method to transform the MPC into a fully equivalent piecewise linear function.Such an operation is equivalent to transforming the MPC into an explicit and parameterized strategy.Further,we transform the piecewise linear function into a deep neural network.Unlike supervised learning,which is widely used in imitation learning,the weights of neural networks can be directly assigned by the proposed method.Based on this initialization method,the existing deep reinforcement learning algorithm can be directly fine-tuned on the basis of the control performance of model predictive control.Since the initial performance of the neural network is identical with that of the MPC,the proposed initialization strategy can help the agent search near the good initial solution and greatly improve the convergence performance and sampling efficiency of the algorithm.In the second work,we combine the initialization strategy with target task decomposition.Although the initialization strategy can improve the sampling efficiency,its performance is dependent on the original MPC.Previous studies have shown that,in one application,MPC can not perform well under all the situations.Therefore,an adaptive sub-task generation module is constructed in this paper,that is,reinforcement learning algorithm does not directly require the agent to master complex skills,but considers the performance of MPC and decomposes the target task into a sequence of sub-tasks from easy to difficult.Specifically,the sub-task generation module will choose the initial subtask which is achievable for the original MPC.Then,with the improvement of strategy performance,the difficulty of sub-task will be improved continuously,and finally help the agent learn to complete complex target task.In order to verify the effectiveness of the algorithm,we tested the algorithm on different simulation platforms.In addition to the commonly used reinforcement learning test environment OpenAI Gym,we also use simulation software to build different test environments,such as quadcopter,urban traffic network and so on.The experimental results show that the initialization strategy can help the reinforcement learning algorithm converge quickly to a better local optimal solution.At the same time,by combining with the adaptive sub-task generation module,the initialization strategy can help the reinforcement learning algorithm achieve good sampling efficiency and convergence performance in different test environments.

Keywords/Search Tags:

Deep reinforcement learning, sampling efficiency, model predictive control, cyber-physical system

PDF Full Text Request

Related items

1	Research On Parameter Learning And Tuning Technology Of Control System Based On Cyber-physical System Method
2	Research On Secure Control For Cyber-physical Systems Under Malicious Attacks
3	Sample Efficiency Improvement Method Of Deep Reinforcement Learning And Its Application In Video Bitrate Control
4	Predictive Control Of Asynchronous Sampling Mass With Multiple Networks-induced Uncertainties
5	Optimization Design For Deep Belief Network And Its Applications
6	Computationally aware control of cyber-physical systems: A hybrid model predictive control approach
7	Research On Agent Decision-making And Control Based On Deep Reinforcement Learning
8	Predictive Control Of Cyber-Physical Systems Under Denial-of-Service Attacks
9	Dynamic Power Control Method Based On Deep Reinforcement Learning
10	Research On Model-based Deep Reinforcement Learning With Active Exploration