Font Size: a A A

Research On DQN (Deep Q-Network) Algorithm In Complex Environmen

Posted on:2024-07-02Degree:MasterType:Thesis
Country:ChinaCandidate:H Y ShiFull Text:PDF
GTID:2568307106481884Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Reinforcement learning is a branch of machine learning that aims to enable agents to learn optimal behavior through interaction with the environment.Traditional reinforcement learning algorithms use a tabular form to record action-value functions,but this approach is limited when dealing with high-dimensional inputs.To address this issue,deep reinforcement learning has emerged by integrating deep learning and reinforcement learning,and replacing the traditional table with neural networks to achieve broader applicability.However,deep reinforcement learning algorithms perform poorly in complex environments.This thesis conducts in-depth research on this issue to improve the performance of deep reinforcement learning algorithms in complex environments.The research object selected in this thesis is the Doudizhu game,which has complex characteristics,such as requiring historical information to make decisions,sparse rewards,strong randomness,and a huge state-action space.These issues are also common challenges in production and life,and this thesis proposes solutions to address them.Firstly,to address the challenges of historical information incorporation and sparse rewards,this thesis proposes the DGQN and DGQN_RLHF methods.DGQN uses a GRU neural network in the DQN algorithm to utilize the information from the previous state to enhance the algorithm’s performance.On the other hand,the DGQN_RLHF method replaces the reward function in the DGQN method with human feedback training to solve the problem of sparse rewards during the training process.Secondly,to address the challenges of strong randomness and a large state-action space,this thesis proposes the Soft Dueling DQN algorithm based on maximum entropy and dueling networks.Maximum entropy is used as a measure of randomness,which is integrated into the DQN algorithm to enable it to learn a random policy even in a random environment.The dueling network is used to reduce the impact of the large state-action space on the algorithm by avoiding the agent’s focus on inferior actions.
Keywords/Search Tags:Deep reinforcement learning, Maximum entropy, Human feedback, Complex environment
PDF Full Text Request
Related items