| Reinforcement learning has achieved remarkable results in decision-making tasks such as game AI and robot control systems.However,traditional reinforcement learning methods require a large amount of exploration in tasks with complex structural organization,large spatial scales,and sparse rewards,resulting in low sample utilization and slow convergence to optimal strategies.Researchers have introduced option and skill discovery frameworks in reinforcement learning to improve the exploration efficiency and sample utilization of agents by learning reusable options or skills.Although option-based reinforcement learning can already solve single goal-oriented reinforcement learning problems with simple structures,they are unable to solve phase goal-oriented reinforcement learning problems with multiple goals combinations due to the insufficient state representation capability of the policy network.Additionally,since option-based methods are complex and their training is unstable,many researchers have focused on skill-based reinforcement learning methods with simpler models.However,skill-based reinforcement learning methods are difficult to directly learn skills in phase goal-oriented reinforcement learning tasks.When learning skills in tasks with sparse rewards,additional human intervention or demonstration data is required,which needs rich knowledge background support,and demonstration data is difficult to obtain.To address these issues,we proposes two models,combining recurrent neural networks in option-based methods and contrastive learning in skill-based methods,and the main research content is as follows:(1)For option-based reinforcement learning methods,due to the insufficient state representation capability of the policy network,it is difficult to solve phase-target-oriented reinforcement learning problems with multiple goals combinations.Therefore,a recurrent state representation based optioncritic method is proposed,which introduces long short-term memory networks to encode the state of each time step combined with the hidden state of the previous time step,incorporating previous state information into each state to enhance the policy network’s representation capability for global task information,and better learn option strategies to solve phase goal-oriented reinforcement learning problems.Finally,experiments on the grid world task environment confirm the good performance of the proposed model.(2)For skill-based reinforcement learning methods,it is difficult to directly learn skills in phasetarget-oriented reinforcement learning tasks,resulting in the problem of introducing too much prior knowledge or human intervention under sparse reward conditions.Therefore,a group-wise contrastive learning based sequence-aware skill discovery method is proposed.First,the method of trajectory group-wise contrastive learning is used to learn the embedding representation of skills to reduce human intervention.Then,sequential skill embedding representations are performed on segmented trajectories during the training and testing stages,combined with the policy network to achieve the training and efficient utilization of sequential skill policies,solving phase goal-oriented reinforcement learning tasks.Finally,experiments on the grid world and particle control environments confirm the effectiveness of the proposed method.(3)In response to the needs of researchers to easily interact in the front-end interface during reinforcement learning algorithm research,adjust model parameters,save models,and intuitively view the final algorithm effect,an agent motion control demonstration system is designed and implemented based on the research content of this paper.This system provides researchers with a simple and easy-to-use interactive interface for training and demonstration of multiple algorithms. |