| The spatial relationship between the objects in the image helps to gain a deep un-derstanding of the image.Currently,spatial relationship recognition has gained increas-ing attention and has been implemented in various computer vision applications.Spatial relationship recognition still presents some challenges.Spatial semantics of spatial rela-tions not only rely on geometric properties and common-sense knowledge,but also on distinguishing their importance.Objects of different categories may have different spa-tial relations,making the recognition of spatial relations even more complex.To address these issues,the present thesis puts forward a spatial relationship recognition model based on multi-feature fusion in Transformer and a spatial relationship recognition optimization model based on multiple spatial feature extraction and graph convolutional neural network reasoning.The main content of this thesis can be summarized as follows:(1)We proposes a spatial attention module,which uses the semantic knowledge of subject-object pairs to enhance the relationship between spatially related subject-object pairs,so as to distinguish the important spatial relationships in the image.Based on the spatial attention module,we build a spatial Transformer module to integrate the spatial semantic visual information of objects and realize knowledge transfer between objects;(2)We study a variety of methods,such as spatial relationship extraction module and spatial relationship graph construction based on spatial attention,to acquire diverse representations of spatial relations,and then use the graph convolutional neural network for reasoning,so as to enhance the recognition of spatial relationship;(3)We carry out experiments on a real-world dataset and compare our algorithm with the latest spatial relationship recognition algorithms.The experimental results demonstrate that our approach achieves higher accuracy and is able to effectively address the problem of spatial relationship recognition. |