Font Size: a A A

Research On Visual Navigation Based On The First-person Perspective

Posted on:2024-06-05Degree:MasterType:Thesis
Country:ChinaCandidate:J FuFull Text:PDF
GTID:2568306920950799Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development and increasing prosperity of artificial intelligence,research achievements in fields such as computer vision,machine learning,and robotics have been widely applied to traditional industries such as manufacturing and various aspects of social life.As an interdisciplinary research combining robotics and computer vision,visual navigation enables the robot to unleash the potential of the ability of visual perception and spatial mobility,thus making it viable of interacting with people and environments intelligently.While mapbased visual navigation methods(such as VSLAM)achieve good results in practical application,constructing maps can be time-consuming and takes much human cost.Besides,optimizing the process of state estimation,planning,and control is challenging.With the rise of deep learning,map-less visual navigation takes these processes into end-to-end models and improves the efficiency and flexibility of navigation tasks.Nowadays most works are concentrating on improving the accuracy and efficiency of visual navigation and developing for a variety of task demands in practical application.While two essential problems are still less addressed.First,visual navigation has made prominent progress in a simulation environment,but the study of running in the real world remains almost unexplored.Second,as for agents navigating in a large area using reinforcement learning,it’s hard to properly define the value of reward and solve the sparse reward problem.To solve the task that reaching the target over a large-scale area,we propose a self-adaptive dual-mode visual navigation method(DMVN)that combines target-driven visual navigation mode and visual driving mode using the designed dynamic mode transfer network.When running in real-world scenes,the dominant mode can be one of the dual modes decided by dynamic mode transfer depending on the position of the agent and its surroundings.The targetdriven visual navigation mode is suitable for finding targets in a limited area using deep reinforcement learning,and the visual driving mode takes charge of long-range navigation and exploration of the environment using conditional imitation learning.The dual modes and dynamic mode transfer run together during the model inference,forming an integrated visual navigation system.The dual-mode self-adaptive visual navigation method is deployed on mobile robots and runs in the real world.This paper collected real-world images to construct discrete grid scenes,which can be used as trajectory data generation and training of reinforcement learning.Similarly,conditional imitation learning uses a long path dataset from expert demonstration to do real-world training.In dual-mode visual navigation,the reinforcement learning model and imitation learning model first are trained in the simulation environment(AI2-THOR,CARLA),then they are further trained in the real datasets.The real scene training includes two stages,the first stage is the training of deep reinforcement learning and conditional imitation learning,and the second stage is the training of dynamic mode transfer network.The experiment shows that the DMVN method can accomplish the target-driven task with a trajectory length of 60m at a higher success rate in the real world.So our method expands the range of the search area at scale and meanwhile preserved high efficiency in time cost and trajectory path.
Keywords/Search Tags:Visual Navigation, Deep Reinforcement Learning, Conditional Imitation Learning, Mobile Robotics
PDF Full Text Request
Related items