Font Size: a A A

Research And Implementation Of Multimodal Algorithm For Path Decision-making In Visual Language Navigation System

Posted on:2022-09-20Degree:MasterType:Thesis
Country:ChinaCandidate:J R ZouFull Text:PDF
GTID:2518306497452124Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
Visual language navigation is a cross-modal task integrating computer vision and natural language processing.This task requires the model to be able to convert and process the information of images and natural language in two different formats,obtain the information,and complete the navigation task in the simulated real 3D environment.At present,most relevant studies tend to improve the performance of visual language navigation models by better processing images and natural language information or improving navigation algorithms,while ignoring the possibility that intelligent robots can obtain more information from the environment.In data set for this task of natural language instruction after analysis,we found that the regional information for a considerable proportion in the natural language instruction,each instruction average appeared twice about regional information vocabulary,combined with navigation in accordance with the instruction of practical experience in our life,this paper presents the use of auxiliary navigation area information model.The region information model proposed in this paper integrates the current region information obtained from the image and the next region information predicted according to the natural language instructions.The cross-modal information is processed as a priori information to assist the navigation model training and navigation of the intelligent robot.After experiments on several open source visual language navigation models,it is found that using regional information to assist training and navigation can improve the success rate of navigation,especially the length of successful path,a key indicator of the task.At the same time,after adding regional information to the model,the performance of the navigation model in unfamiliar environment is also improved.At the same time,the research of visual language navigation task is mostly in English.On the basis of the existing results,this paper processed the data set in Chinese,and carried out the research of Chinese visual language navigation task,and got a good performance.
Keywords/Search Tags:Visual language navigation, region information, reinforcement learning, cross-modality, Chinese navigation
PDF Full Text Request
Related items