Font Size: a A A

Research On Method Of Robot Vision Scene Understanding

Posted on:2023-03-09Degree:DoctorType:Dissertation
Country:ChinaCandidate:P TianFull Text:PDF
GTID:1528306941490334Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
The rapid development of computer vision and artificial intelligence technologies has promoted the popularization and application of intelligent terminal equipment,meanwhile the pace of human society’s development towards informatization and intelligence is gradually accelerating.Although the modern computer vision and other technologies have made remarkable achievements in the perception of the visual scene environment,but the understanding of visual scene environment is far from the level of practical application and popularization.To fully take advantage of the increasingly popular intelligent mobile terminals to realize the observed scene perception and understanding for promoting the intelligent development of human society,to perceive and understanding of visual scene environment has always been a research hotspot in artificial intelligence and other related fields.This thesis will mainly focus on the understanding of the visual scene environment,and gradually detect,recognize and understand the visual scene environment from three different levels:scene-level recognition,scene-level understanding,and practical application.The key techniques in this thesis focus on the scene graph generation and image captioning vision tasks from scene images,and implementing the proposed scene understanding method on mobile robot terminals,and through the real-time acquisition and processing of the indoor scene environment image to understand the semantic content of scene environment.For the connection and information sharing mechanism between scene graph generation and image captioning tasks,we propose the related scene understanding algorithms and applied them to scene graph generation and image captioning tasks to achieve accurate and comprehensive scene understanding.The main research contents and results obtained of this thesis are as follows:1.Scene graph generation based on relationship reasoning.Accurately inferring visual relationships between objects plays a central role in realizing scene understanding,and the interplay between contextual information of object pairs and their relationships can effectively regularize the space of visual relationship types to improve the accuracy of relationship reasoning.Aiming at the accuracy not satisfactory problem from generating scene graphs from scene images,we propose a relationship reasoning network model to incorporate the interplay into deep neural networks to facilitate scene graph generation.In this thesis,we use a feature updating structure to mutual connection and iterative update the features of objects and relationships to explore contextual information between objects,and leverage a graph attention mechanism to obtain the correlation information between object pairs and their relationships to improve the accuracy of scene graph generation.Experiments on the Visual Genome dataset demonstrate that the proposed model outperforms other compared scene graph generation models.2.Image captioning generation based on multi-level semantic context information.The aim of image captioning is to use the natural language to describe information such as the mainly objects and theirs relationships.Aiming at the description effect not satisfactory problem from the existing captioning methods convert the extracted image features into description text.In this thesis,a multi-level semantic context information network model is built to use the context information between different semantic layers for achieving the accurate and comprehensive description of the scene image.The model aligns the different semantic features into a feature refining structure to mutual connections and iteratively updates,and then uses a context information extraction network to extract the context information between different semantic layers,and finally the acquired context information is fed into an attention mechanism to improve the accuracy of image captioning.The reinforcement learning method is introduced to optimize the evaluation index to train the model to further improve the description effect.Experiments on the COCO dataset demonstrate that the effectiveness of the proposed image captioning method.3.Realize scene understanding based on multi-layer semantic task generation.The existing scene understanding methods take a single or partial vision task as the research target,and the results are not satisfactory.A multi-level semantic tasks generation network model is proposed to leverage mutual connections across object detection,relationship detection and image captioning visual tasks,to simultaneously solve and improve the accuracy of the above vision tasks and achieve the more comprehensive and accurate scene understanding.Firstly,a message pass graph is designed to mutual connections and iterative updates across the different semantic features to improve the accuracy of scene graph generation.Secondly,as the problem of the image captioning effect is unsatisfactory,the model leverages a fused attention mechanism to extract more useful feature information to improve the performance of image captioning.Finally,the model use the fused attention mechanism to improve the image captioning while using the mutual connections and refines of different semantic features to boost the object detection and scene graph generation.Experiments on datasets such as Visual Genome show that our proposed method can jointly learn different visual tasks to simultaneously promote those visual tasks generation.4.Research on scene understanding of mobile robot.For the existing scene understanding methods based on deep learning technology are mainly implemented on computer terminals,and it is difficult to implement practical tests and applications.A scene understanding system is proposed to realize the semantic content understanding by building an experimental environment on the mobile robot terminal to acquire the indoor scene environment in real-time.Different from the static images used when testing on the server side,the scene understanding system leverages the strong compatibility of Linux operating system and deep learning framework,which can use the visual sensor to dynamically acquire and process the image data of scene environment to realize object detection and recognition,relationship detection and image captioning vision tasks on mobile robot terminal,and realize the robot’s perception and understanding of the indoor scene environment from multiple aspects.In general,this thesis conducts a systematic and in-depth research on the relevant methods of scene understanding,and proposes three learning algorithms for the main visual tasks in scene understanding,which are respectively applied to the scene graph generation and image caption vision tasks,and achieved the good results.Finally,the proposed scene understanding method is experimentally tested on the mobile robot terminal to understand the semantic content of the indoor scene environment,and achieved the certain effects,which has practical significance for the promotion of scene understanding technology from computer terminals to intelligent mobile terminals.
Keywords/Search Tags:Scene understanding, Scene graph generation, Image captioning, Context information, Attention mechanism, Mobile robot
PDF Full Text Request
Related items