| In recent years,mobile robot has become an important research field of artificial intelligence and has been widely used in people’s life.In the exploration of unknown environments,mobile robots can replace human beings to complete data collection in complex and dangerous environments,and at the same time,they are more automated and intelligent.Autonomous path planning in the unknown environment is the core problem of robot exploration tasks.In order to improve the decision-making ability of mobile robots in the environment and fully explore the environment,This thesis establishs the framework of Markov decision process and develops an environmental exploration method based on environmental field information and extremum feature.The main research contents of this thesis are as follows:(1)A Markov decision process model based on reinforcement learning paradigm is established for exploring unknown environments,forming a theoretical framework for autonomous information path planning of mobile robots.Mobile robots use the theoretical framework to complete the action sequence decision and environmental information interaction in the unknown environment,and then complete the environmental exploration task.(2)Aiming at the problem of global information exploration in environmental field,an environmental exploration method based on the upper confidence bound is developed to improve the accuracy and efficiency of environmental features prediction by balancing the exploration and utilization of robot.The method uses the method of action value evaluation to process the information obtained from the interaction between the robot and the environment,and calculates the uncertainty of the action according to the historical action information of the robot to realize the real-time planning capability of the robot.Taking the Gaussian single-extreme field and Ackley multiextreme environmental field as examples,the numerical simulation method is compared with other path planning methods to verify the effectiveness of this method.Finally,Turtlebot2 robot was used to verify the information path planning method based on the upper confidence bound in the real environment.(3)Aiming at the problem of exploring local characteristic extremum of environmental field,a path planning method based on DDQN algorithm considering the limited energy carried by mobile robots in practical applications is developed.In this method,reward function is designed according to gradient change of environmental information and action decision is made under the restriction of energy consumption.The numerical simulation results show that the proposed method can achieve a high success rate by taking the Gaussian single-extreme field and Ackley multi-extreme environment field as examples. |