Deep reinforcement learning is the combination of deep neural networks and reinforcement learning methods,possessing high perception capability and good autonomous decisionmaking ability.In recent years,deep reinforcement learning has achieved many research results in multiple fields,attracting widespread attention.However,in practical applications,there is a significant difference between the parameters of the real environment and those of the simulation environment,which to some extent affects the learning effect of the agent and may even lead to irreversible losses.Therefore,how to improve the robustness of agent is an urgent problem that needs to be solved.In the thesis,we propose to enhance the robustness of the agent by optimizing the curiosity exploration mechanism,increasing the auxiliary task incentives,and exploring abnormal tasks.The main research contents can be summarized as follows:(1)To address the problem of insufficient exploration ability of agent,the curiositydriven algorithm based on adversarial learning is proposed.The algorithm builds on an adversarial learning framework,where the main agent is updated using a general policy update,while the adversarial agent is updated using a curiosity mechanism.Additionally,a teacher model is constructed to optimize its internal reward function,driving the agent to engage in more efficient and robust learning by engaging in effective adversarial interactions.Through the main agent can explore unknown state spaces while avoiding irrational behaviors when encountering unknown states;(2)The randomness and uncertainty of generated adversarial samples in adversarial learning often result in the agent’s inability to converge during later learning stages.Consequently,there is a potential risk of unpredictable attacks,making it challenging for the agent to maintain stable performance in adversarial environments.To address the problem,the thesis proposes a robust deep reinforcement learning method based on auxiliary task incentives,which categorizes auxiliary tasks into positive and robust tasks.Positive tasks can stimulate the agent’s exploration of non-explicit information in the environment to obtain higher rewards,while robust tasks can supplant adversarial samples by incentivizing the agent to complete auxiliary tasks that resist attacks during training,thereby enhancing the agent’s robustness.Moreover,the algorithm introduces a robustness evaluation network to dynamically generate auxiliary coefficients for selecting various types of auxiliary tasks and improving the algorithm’s training effect.(3)The two aforementioned work contents merely simulate abnormal environments during the training process,which still falls short of the actual abnormal environments.To address the problem,the thesis proposes a robust deep reinforcement learning method that leverages abnormal task exploration by transforming auxiliary tasks into main tasks.The algorithm’s training process consists of two stages:offline learning and online learning.During the offline learning stage,samples are filtered,and the action neural network is pruned.In the online learning stage,the elastic weight consolidation algorithm is employed to restrict the update magnitude of the agent’s network,control the Markov Decision Distance between the target student model and the teacher model to incentivize the agent to explore in the correct direction,and modify the Q-value evaluation method of the model according to the changes in the multi-task scenario to improve the agent’s robustness.Additionally,the algorithm treats other training tasks as abnormal tasks of the current task,enabling the agent to explore different abnormal environments and enhance its robustness.The thesis proposes methods to improve the performance of the agent by enhancing its robustness,including optimizing the curiosity exploration mechanism,increasing the incentives of auxiliary tasks,and exploring abnormal tasks.The feasibility of the algorithms is demonstrated through experimental analysis. |