Deep Reinforcement Learning(DRL)is an important research topic in the field of machine learning.DRL algorithms use deep learning to extract features from input data,and then use reinforcement learning to learn policy with feature information as state input.DRL algorithms have problems such as excessive computation and long training time.Asynchronous Deep Reinforcement Learning(ADRL)greatly shortens the training time of the learning model by using multi-threading technology.However,in the game tasks based on visual perception,the traditional ADRL methods can not fully utilize the image features and regions with important value in the later period of algorithm training.And ADRL algorithms converge slowly in some game tasks.Based on the above problems,this paper introduces feature attention mechanism,visual attention mechanism and Dyna structure optimization method with prioritized sweeping algorithm into ADRL algorithms.The main research content is as follows:(1)Asynchronous advantage actor-critic with feature attention mechanism(FAM-A3C).When ADRL algorithms deal with large-scale state space tasks based on visual perception,all the feature information of the entire original image is acquired by Agent.Agent can balance all the state features,which in turn will cause the learning model not to focus on valuable feature information,thus losing some important information in the later training.Aiming at this problem,feature attention mechanism is proposed and introduced into ADRL algorithms.An asynchronous advantage actor-critic with feature attention mechanism is proposed.(2)Asynchronous advantage actor-critic with double attention mechanisms(DAM-A3C).ADRL algorithms with feature attention mechanism can effectively utilize the important features' information of the image,but there is still one problem that the valuable image regions' information is not fully utilized.Aiming at this problem,visual attention mechanism is introduced into ADRL algorithms with feature attention mechanism,and an asynchronous advantage actor-critic with double attention mechanisms is proposed.The new algorithm describes the state information of the original image from two dimensions of image features and image regions,which helps Agent to learn the optimal strategy efficiently.(3)Asynchronous advantage actor-critic with Dyna architecture and prioritezed sweeping(Dyna-PS-A3C).The above two new algorithms are improved based on the deep neural network model.The advantages and disadvantages of ADRL algorithms are not only closely related to the model architecture,but also closely related to the reinforcement learning algorithm.In order to shorten the convergence time of ADRL algorithms in some visual perception tasks,the Dyna structure optimization method with prioritezed sweeping algorithm is introduced into ADRL algorithms,and an asynchronous advantage actor-critic with Dyna architecture and prioritezed sweeping is proposed.The above improved algorithms based on ADRL algorithms use Atari 2600 games as the experimental objects,and compare with the existing ADRL algorithms to verify the effectiveness.The ADRL algorithms with attention mechanisms can effectively utilize the important information of the game image and improve the learning performance of the algorithm.The ADRL algorithm with Dyna structure and prioritezed sweeping avoids Agent to explore the meaningless state and shortens the convergence time of the algorithm. |