Font Size: a A A

Research On The Strategy Of Multi-robot Encirclement And Cooperation Based On Deep Reinforcement Learning

Posted on:2024-06-09Degree:MasterType:Thesis
Country:ChinaCandidate:Z K GaoFull Text:PDF
GTID:2568307064986009Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years,with the continuous development of multi-robot control technology,the problem of multi-robot pursuit-evasion has gradually become a representative problem in the field of multi-robot cooperative control.It has wide applications in post-disaster search and rescue,battlefield ground cooperation simulation exercises,and multi-UAV collaborative operations.Deep reinforcement learning is an end-to-end self-learning trial-and-error method,which has advantages over traditional methods in adaptive strength,better handling of nonlinear problems,and better handling of high-dimensional state and action information.This enables robots to autonomously learn control strategies and adapt to tasks of robot control in complex environments.Therefore,this paper uses deep reinforcement learning technology to study the problem of multi-robot pursuit-evasion.Based on the problem of multi-robot pursuit-evasion,this paper aims to improve the learning efficiency of the pursuit robot strategy,and the main research contents are as follows:(1)Given that in realistic trapping scenarios,trapping robots need to interact frequently with the real environment to obtain a large amount of experiential data,which is detrimental to the learning of trapping strategies,reducing the training costs of trapping robot learning strategies,including computational resources and energy consumption,remains one of the issues that need to be addressed in this field.Furthermore,in trapping scenarios,robots can only observe local information about the environment,which creates an incomplete information problem.Therefore,this article proposes a model-based centralized trapping strategy,which applies kinematic models and trajectory prediction models to provide prior knowledge for reinforcement learning models to perform pre-training,reduce the time for trapping robots to interact with the real environment and the cost of collecting experiential data,and further accelerate the process of trapping robot learning strategies.The centralized training strategy partially alleviates the problem of incomplete information caused by the local observation of trapping robots.(2)In real-world capture missions,when using deep reinforcement learning to train control policies for capture robots,a significant amount of repetitive exploration of unknown environments is required.This results in a low utilization rate of empirical data samples,and it is worth investigating how to effectively use existing empirical data to achieve a higher sample utilization rate.Furthermore,to alleviate the problem of slow response caused by centralized training,this paper proposes a model-based distributed capture strategy.Using an environmental state transition dynamics model,this model can be updated with empirical data collected by the capture robot while training the control policy,providing additional empirical training data for the control policy to improve the capture success rate and sample utilization rate of the capture robot.The distributed execution mode improves the response speed of individual capture robots.(3)Most existing multi-agent experimental environments for the pursuit-evasion scenario are established in a two-dimensional setting,and suffer from poor openness,scalability,and portability.In this paper,we built a multi-robot pursuit-evasion experimental environment and a multi-robot deep reinforcement learning framework based on the Webots robot simulation platform.We provide a reliable experimental environment for the study of multi-robot pursuit-evasion problems,and demonstrate the effectiveness of the proposed method in terms of capture success rate and average reward return.
Keywords/Search Tags:Multi-robot pursuit, deep reinforcement learning, cooperative strategy, policy learning efficiency, Sample utilization rate
PDF Full Text Request
Related items