Deep Reinforcement Learning Based On Preferred Samples And Demonstrations

Posted on:2021-01-26

Degree:Master

Type:Thesis

Country:China

Candidate:G P Xiang

Full Text:PDF

GTID:2428330629951256

Subject:Control Science and Engineering

Abstract/Summary:

PDF Full Text Request

After years of development,the theoretical research of deep reinforcement learning has gradually matured,which has been particularly prominent in solving high-dimensional original input data problems and control decision problems.However,in the face of complex environments,deep reinforcement learning algorithms require a lot of time when training the network,resulting in inefficient algorithms.In response to the above issues,the main content of this thesis expressed as follows:Firstly,for the problem of randomly selecting samples for experience replay,which leads to the low efficiency of agent training,the algorithm of prioritized experience replay based on preferred samples is proposed.First,the pre-training network generates a threshold for sample selection.And it selects samples with higher priority to enter the experience replay buffer.Second,during the training process,the probability of samples with higher priority is increased through the method of priority updating.So the agent learns to the final goal faster.Finally,a method of partial reward remodeling is used to give an additional reward to the first few steps of the final goal,And the agent can move towards the final goal faster during the final exploration learning to improve the efficiency of agent training.Secondly,in the experimental environment of the existing demonstration samples,a model of deep inverse reinforcement learning based on demonstrations is proposed.And the model uses the demonstration samples to improve the efficiency of agent training.First,let the agent imitate the demonstrations as much as possible through the pre-training network.Second,reconstruct the reward function of the demonstrations through the deep apprenticeship learning network,and output the policy distribution of the actions in the demonstrations.And then,reconstruct the reward function of the randomly explored samples through the inverse reinforcement learning network.Finally,a new loss function is constructed using the reconstructed reward function and the strategic distribution of actions,and passed to the deep positive reinforcement learning network,thereby improving the training efficiency of the agent.Gym and Atari experimental platform results show that,compared with other classic reinforcement learning algorithms,due to the use of sample optimization and the use of demonstrations to construct a new reward function,the proposed algorithm further accelerates the training speed and improves the training efficiency in the deep reinforcement learning environment.There are 20 figures,4 tables,and 70 references in this thesis.

Keywords/Search Tags:

deep reinforcement learning, preferred samples, inverse reinforcement learning, demonstrations

PDF Full Text Request

Related items

1	Demonstrations With Dynamic Bonus For Deep Reinforcement Learning
2	Supervised Reinforcement Learning:methods And Applications
3	Research On Security Deep Reinforcement Learning Based On Experiences
4	Deep Reinforcement Learning With Self-Generated Expert Samples
5	Research On Reinforcement Learning Based Control Method Of Magnetic Navigation AGV
6	Research On Group Confrontation Strategies Based On Deep Reinforcement Learning
7	Research On Resource Assignment Problem Based On Deep Reinforcement Learning
8	Inverse Reinforcement Learning And Imitation Learning With Applications In Intelligent Robotics
9	Research On Stock Trading Based On Deep Reinforcement Learning
10	The Application Of Deep Inverse Reinforcement Learning In Robotic Visual Servo Control