Research On Sample Generation And Selection Methods For Deep Reinforcement Learning

Posted on:2022-02-01

Degree:Master

Type:Thesis

Country:China

Candidate:T Yang

Full Text:PDF

GTID:2518306527970309

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Deep reinforcement learning is an important research branch used in the field of Artificial Intelligence for sequential decision-making problems.It learns the optimal policy based on the samples generated during the interaction between agent and environment.Since the learning process requires sample generated by a large number of interactions between agent and environment as support,deep reinforcement learning algorithms are limited in some applications where we acquire samples expensively.The use of different behavior strategies in deep reinforcement learning will generate different samples.At the same time,the choice of samples will also affect the learned policy.To improve the sample efficiency of deep reinforcement learning,reduce the interaction between agent and environment,and obtain a high-quality policy,the following works are completed in this paper:(1)Proposing an adaptive ?-greedy policy based on average episodic cumulative reward(AECR-DQN).The "?-greedy" policy,commonly used in deep reinforcement learning,is a random exploration policy for sample generation.This policy does not consider other factors that affect the agent decision-making,so it is of some blindness.Therefore,this paper uses the episodic cumulative reward received by the agent after it completed a task as a guide for the agent to conduct exploration or exploitation reasonably.Experimental results show that the deep Q network algorithm based on the adaptive ?-greedy policy of average episodic cumulative rewards can generate samples that are more conducive to learning the optimal policy and obtain higher rewards.(2)Different from the traditional deep reinforcement learning that takes samples of one-step transitions uniformly at random from the experience replay memory,a method of sample generation and selection using a whole episode as the training sample is proposed.First,a method for generating episode samples based on genetic crossover operator(GCO-DQN)is proposed in which a similar state in the two episodes is used as the crossover point to generate episodes that have not appeared.Thus,the number and diversity of episodes are increased.Based on the expansion of episodes,a method for selecting episodes based on genetic selection operation(GSCO-DQN)is proposed.The cumulative reward of an episode is used as the criterion for judging the importance of the episode.The method can not only ensure the diversity of episode but also increase the sampling probability of episodes with high importance.Experimental results show that the method for generating and selecting deep Q network samples based on genetic operators can reduce the number of interactions between agent and environment,improve the sample utilization,and obtain a policy with higher rewards.(3)Combining AECR-DQN and GSCO-DQN,a sample generation and selection method based on genetic operator and adaptive ?-greedy policy(AECR-GSCO-DQN)is proposed.After the adaptive ?-greedy policy generated samples in a more targeted manner,the genetic crossover operator applied on this sample to get more diverse samples,and then the genetic selection operator selects from the samples that are more conducive to learning the best policy.Experimental results show that compared with GSCO-DQN,AECR-GSCO-DQN can achieve a higher average reward,and improve the level of policy.

Keywords/Search Tags:

Deep reinforcement learning, sample efficiency, episodic cumulative reward, experience replay memory, genetic algorithm

PDF Full Text Request

Related items

1	Sample Efficiency Improvement Method Of Deep Reinforcement Learning And Its Application In Video Bitrate Control
2	Study Of Robot Arm Control Based On Deep Reinforcement Learning
3	Research On Experience Replay Method For Deep Reinforcement Learning
4	Deep Reinforcement Learning With Experience Replay
5	Improvement And Application Of Deep Reinforcement Learning Based On Experience Replay Mechanism
6	Research On Optimization Methods Of The Experience Replay Mechanism For Off-policy Reinforcement Learning
7	Research On Experience Replay In Deep Reinforcement Learning
8	Research On Optimization Method Of Deep Reinforcement Learning Experience Replay
9	Towards Sample-efficient Deep Reinforcement Learning
10	Research On Advanced Deep Reinforcement Learning Algorithm For Image Games