| The traditional agent formation approaches have strong dependence on agent dynamics model.Hence,it is difficult to be applied to complex working scenarios.In this thesis,based on the deep reinforcement learning algorithm,a multi-agent localization method and control algorithm are designed.Multi Agent Deterministic Policy Gradient Algorithm(MADDPG)is used to guide agents to make decision on movement.The localization,formation organization,obstacle avoidance and fault tolerance control of multiple agents are realized.The main research work and innovation of this thesis are as follows:(1)A localization method combining a least square method and a multi-layer perceptron algorithm is proposed.The localization method uses the three-dimensional positioning data from four observation stations to determine the position of agents in three-dimensional space by the iterative least square method,which solves the indoor positioning problem of agents.By considering possible measurement errors(for example,agents block each other or obstacles block the view of the observation station),this thesis uses the multi-layer perceptron to correct the errors.The simulation results show that the positioning accuracy is maintained at millimeter level.(2)A multi-agent formation method based on MADDPG algorithm is designed.This method uses a new formation reward function,which only restricts the topology between agents,but does not specify the specific position of agents in the formation.At the same time,in order to solve the problems such as agent off-line in practical work,the compression method based on knowledge distillation technology is adopted.This method uses the teacherstudent model to integrate various formation strategies into the same model,and realizes the fault-tolerant control.Simulation results show that the multi-agent formation method can realize the tasks of tracking,formation,obstacle avoidance and fault tolerance control.It is superior to the traditional artificial potential field method in terms of convergence speed,path optimization and operation efficiency.The convergence issues of independent reinforcement learning algorithm is solved.The balance problem of obstacle avoidance and formation maintenance is dealt with.(3)An intelligent robot as an agent is designed in this thesis.The control system and control algorithm are used.The data required by the algorithm are generated by positioning equipment.The results show that the proposed algorithm is suitable for the actual system,and can guide the robot to make decision and realize the formation tasks. |