| The scene graph generation refers to detecting target from images and inferring their relationships,and using graph structures to represent the images.Scene graph is a bridge between natural language and computer vision,which has become a popular research field of image understanding in recent years.As a powerful tool for image understanding,deep learning has also been widely used.However,the existing scene graph generation methods still have two problems.Firstly,the diversity of relationships inferred by existing scene graph generation methods is limited.On the one hand,imperfect characteristics will limit the diversity of relationships.The existing methods simply use visual features for category reasoning,and the differences between similar relationships are small,which limits the diversity of relationships.On the other hand,the long-tailed distributions of the data set will limit the diversity of relationships.The sample size of common triplets occupies most of the data set,while the sample size of many uncommon relationships is small.Existing methods predict all similar relationships as common ones to increase the recall rate,which will hurt the diversity of relationships.Secondly,the existing scene graph generation methods have poor domain adaptability.They are all based on specific natural image data sets,most of which contain unique reasoning habits of the specific data sets.This reasoning habit limits the domain adaptability of the methods.For the first problem of limited relationship diversity,this work proposes a scene graph generation method based on global-semantic information assistance,called SGG_G-SIA.Firstly,SGG_G-SIA proposes to integrate the global statistical knowledge and semantic information provided by the data set into a global semantic coding and integrates it with visual features to represent targets and relationships,which can alleviate the poor relationship diversity caused by imperfect features.Secondly,SGG_G-SIA uses the reprocessed global statistical knowledge to guide the inference of the target and relationship category,which can alleviate the poor relationship diversity caused by the long-tailed distributions.This measure can solve the long-tail distribution of the data set and increase the diversity of relationships.Finally,SGG_G-SIA designs different networks to perform feature fusion and category reasoning on targets and relationships respectively.This measure can meet the aggregation needs of different information,and to make the module pertinent.For the second problem of poor adaptability in the network domain,this work proposes a scene graph generation method based on multi-modal fusion and counterfactual reasoning,called SGG_MFCR.SGG_MFCR fuses the information of the two modalities of vision and language into the predictive features of the relationship,providing richer information for the expression of relationships.After that,SGG_MFCR adopts a counterfactual reasoning strategy to summarize the unique reasoning habits of the specific data set,and explicitly eliminates this reasoning habit during the test,so as to obtain a network that can fairly predict common and uncommon relationships and has good domain adapt capability.The SGG_MFCR network trained on existing data sets can be directly applied to computer-generated images without relying on their annotations and secondary training.Finally,this work starts from the image description and semantic layout to generate computer-generated image sets.After that,it applies SGG_MFCR to this image set to verify the domain adaptation of the network and generate robust scene graphs to assist people in understanding computer-generated images.The above methods have been comprehensively tested in this work.Experimental results show that the two methods proposed in this work can generate more robust scene graphs to describe images.Compared with the existing scene graph generation methods,SGG_G-SIA has a significant improvement in the feature richness of targets and relationships and the diversity of relationships.SGG_MFCR has significantly improved whether it is the feature richness of targets and relationships,the diversity of relationships,or domain adaptability.In a nutshell,the scene graph generation methods proposed in this work are superior to the existing scene graph generation methods in many aspects.Some of the results have been published as SCI journal papers. |