Font Size: a A A

Deep Reinforcement Learning Based On Preferred Samples And Demonstrations

Posted on:2021-01-26Degree:MasterType:Thesis
Country:ChinaCandidate:G P XiangFull Text:PDF
GTID:2428330629951256Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
After years of development,the theoretical research of deep reinforcement learning has gradually matured,which has been particularly prominent in solving high-dimensional original input data problems and control decision problems.However,in the face of complex environments,deep reinforcement learning algorithms require a lot of time when training the network,resulting in inefficient algorithms.In response to the above issues,the main content of this thesis expressed as follows:Firstly,for the problem of randomly selecting samples for experience replay,which leads to the low efficiency of agent training,the algorithm of prioritized experience replay based on preferred samples is proposed.First,the pre-training network generates a threshold for sample selection.And it selects samples with higher priority to enter the experience replay buffer.Second,during the training process,the probability of samples with higher priority is increased through the method of priority updating.So the agent learns to the final goal faster.Finally,a method of partial reward remodeling is used to give an additional reward to the first few steps of the final goal,And the agent can move towards the final goal faster during the final exploration learning to improve the efficiency of agent training.Secondly,in the experimental environment of the existing demonstration samples,a model of deep inverse reinforcement learning based on demonstrations is proposed.And the model uses the demonstration samples to improve the efficiency of agent training.First,let the agent imitate the demonstrations as much as possible through the pre-training network.Second,reconstruct the reward function of the demonstrations through the deep apprenticeship learning network,and output the policy distribution of the actions in the demonstrations.And then,reconstruct the reward function of the randomly explored samples through the inverse reinforcement learning network.Finally,a new loss function is constructed using the reconstructed reward function and the strategic distribution of actions,and passed to the deep positive reinforcement learning network,thereby improving the training efficiency of the agent.Gym and Atari experimental platform results show that,compared with other classic reinforcement learning algorithms,due to the use of sample optimization and the use of demonstrations to construct a new reward function,the proposed algorithm further accelerates the training speed and improves the training efficiency in the deep reinforcement learning environment.There are 20 figures,4 tables,and 70 references in this thesis.
Keywords/Search Tags:deep reinforcement learning, preferred samples, inverse reinforcement learning, demonstrations
PDF Full Text Request
Related items