| With the continuous development of science and technology,the technical update of indoor robot has become very frequent.In the process of human-computer interaction,it is very necessary to communicate with human beings and understand human semantic instructions.The 2D grid map generated by the robot through simultaneous positioning and modeling technology belongs to data map.Each grid point contains the pose and positioning coordinates.These data can mark the coordinate information of each location,but they lack semantic information.For example,when people tell robots the specific definition of terms,robots can’t understand the meaning.In addition,when executing the command operation of a specific scene,it is necessary to input the specific coordinates of the scene position,but it is difficult for humans to express the position as specific coordinates.At this time,we need to combine the appropriate semantic information with the map generated by the robot,so as to generate an effective semantic map to improve the efficiency of human-computer interaction.In the research work of this paper,it is mainly proposed to use the local specific object information combined with the global scene features for scene recognition.Firstly,the coherent image information is extracted frame by frame from the robot vision sensor by means of deep learning,and the RGB feature information extracted by the classification network and detection network of computer vision is fused,Better improve the robot’s scene recognition ability and make the semantic information more accurate.In addition,through the map segmentation algorithm,we attach the scene semantic information to the map area,add the robot coordinate information and pose information,as well as the supervised object information generated by individual object detection,finally generate the metric semantic map.This paper has experimented our approach in three public datasets and achieved good results.In the second work,because the nodes and edges of the topology map can effectively reflect the location relationship of the map and some semantic information,and it has the advantage of small storage,we create the topology map on the basis of measuring the semantic map,which can facilitate the subsequent robots to perform various tasks.The generation process of three types of topological semantic map is proposed,including the way based on mileage information,scene area and image aggregation.A method of topological image feature aggregation based on deep learning is proposed.This method aggregates the feature information in the node information,which is conducive to preserving the coherent features of the data.Finally,we test the storage size and detection accuracy of the topology map,and its performance can support the robot’s understanding of various instructions and the execution of navigation tasks. |