Font Size: a A A

Research On Reinforcement Learning Technology Of Agent Based On Cognitive Behavioral Knowledge

Posted on:2022-07-15Degree:MasterType:Thesis
Country:ChinaCandidate:J X LiFull Text:PDF
GTID:2558307169479774Subject:Control engineering
Abstract/Summary:PDF Full Text Request
Limited by the problem of sampling efficiency,when the agent faces complex tasks such as high-dimensional continuous state space,sparse rewards,and multi-agent collaboration,it is difficult for a reinforcement learning agent to learn an optimal policy from scratch.How to express the existing knowledge in an understandable and usable form for agents and use it to accelerate strategies learning is still a difficult problem.First of all,this paper proposes a deep reinforcement learning framework based on the cognitive behavior model,modeling prior domain knowledge as a cognitive behavior model based on Belief-Desire-Intention(BDI).Then,based on this framework,deep reinforcement learning algorithms are proposed in single agent and multi-agent application environments,and the guidance method of cognitive behavior model for agent strategy update is quantitatively designed.Finally,a reinforcement learning prototype system based on cognitive behavioral model is designed.Taking the UAV reconnaissance path planning task as an example,the effectiveness of the method proposed in this paper are verified.The main work content and innovations are as follows:1.A reinforcement learning framework based on cognitive behavioral models is proposed.First,the cognitive behavior knowledge is constructed as a cognitive behavior model based on the BDI agent model to provide dynamic guidance for learning.On this basis,a single-agent reinforcement learning architecture based on a cognitive behavior model and a multi-agent reinforcement learning architecture based on a cognitive behavior model are respectively proposed.In addition,the functional modules in the proposed architecture and the interaction between the modules are specifically designed to provide an architectural basis for the subsequent algorithm design of this paper.2.A single agent heuristically accelerated deep reinforcement learning algorithm is designed and implemented.In order to alleviate the impact of high-dimensional state space and sparse rewards on the efficiency of reinforcement learning,a heuristically accelerated deep Q network(HADQN)is proposed.First,the form and principle of heuristic strategies are designed to combine cognitive behavior with the learning process.Second,a heuristic strategy network to fit cognitive behavior knowledge is constructed and is integrated into the deep Q network.Third,the update method of the heuristic strategy network and its dynamic guidance method for learning are designed.Finally,under the typical GYM environment and the Star Craft 2 environment,it is verified that the algorithm can dynamically extract effective cognitive behavior knowledge according to environmental changes,and accelerate the agent strategy convergence with the help of the heuristic strategy network.3.A multi-agent heuristically accelerated collaborative reinforcement learning algorithm is designed and implemented.Aiming at the problem of a huge increase in the dimension of the state space,as well as partially observable characteristic of the agent in a multi-agent environment,a heuristically accelerate QMIX network(HAQMIX)is proposed.First,the heuristically accelerate DRQN network base on GRU core is designed,to solves the problem of the difficulty of confirming the state of the agent in some partially observable environments.Secondly,a joint value network based on Mixing network for heuristically accelerated agent is designed,which takes the stable training advantage of centralized training and decentralized execution.Finally,under the Star Craft 3M environment,it is verified that the algorithm can apply the knowledge provided by the model to the learning process,and at the same time can efficiently use the correct knowledge to accelerate the convergence of agents’ strategies.4.The prototype system is designed and implemented and is verified by designed cases.Based on the framework and algorithms’ research results,this paper designed and implemented a prototype system for multiple learning environments.First,the construction and integration of system function modules such as environment selection module,model setting module,algorithm configuration setting module,simulation testing module and decision application module are implemented.Secondly,singleagent and multi-agent learning environments based on UAV reconnaissance path planning tasks are designed and realized.Finally,the performance of the system under designed environments is demonstrated,and then the validity and advantages of the designed framework and specifically implemented algorithms of this paper are verified.
Keywords/Search Tags:Cognitive Interaction Model, Reinforcement Learning, Multi-Agent, Heuristically Accelerated Algorithm
PDF Full Text Request
Related items