| Commonsense knowledge is a set of information that covers the widest range of human knowledge and helps humans to understand everyday events.It plays an important role in the field of artificial intelligence.At present,researchers have constructed many datasets containing commonsense knowledge.Existing commonsense knowledge datasets usually focus on text data,or only focus on semantic understanding of image data.However,the correlation between image data and text data that contains commonsense knowledge is ignored.If the graphic data can be linked to represent commonsense,it will be better to convey information and provide effective assistance for various tasks in artificial intelligence.Based on the above understanding of commonsense knowledge,this paper constructs a commonsense knowledge base for graphic data,that is,Graphic-Text Commonsense Base.Graphic-Text Commonsense Base integrates commonsense data of images and texts,including extracting entities contained in images,commonsense relationships between entities,commonsense existing in entities themselves,and textual descriptions of images,etc.In the construction of Graphic-Text Commonsense Base,graphic-text knowledge usually exists in the relationship between entities.Therefore,the most important task is visual relationship detection,which is used to capture most of the commonsense knowledge in images.Therefore,this paper conducts research on the task of visual relationship detection in the construction of Graphic-Text Commonsense Base.The specific research contents are divided into the following:(1)A visual relationship detection model based on label hierarchy.In order to improve the detection effect of fine-grained visual relationships,this paper divides the relationship labels into thickness and intensity to construct a hierarchical representation of labels,and proposes a visual relationship detection model based on label hierarchy.The model uses the similarity between visual relationships and the bias of the data itself to construct a hierarchical representation of relationship labels,so as to distinguish relationships into coarse-grained relationships and fine-grained relationships,so that fine-grained relationships can get more attention.The proposed visual relationship detection model based on label hierarchy outperforms the state-of-the-art visual relationship detection methods in fine-grained relationship detection.(2)A visual relationship detection model based on Graphic-Text Commonsense Base.In order to verify that Graphic-Text Commonsense Base can provide rich information for image understanding tasks,this paper proposes a visual relationship detection model based on Graphic-Text Commonsense Base.The model represents the Graphic-Text Commonsense Base and the characteristics of the object in a graph structure,and then connects them,updates the representation of nodes and edges in the graph through the gated graph neural network,learns the rich image and text information of the Graphic-Text Commonsense Base,and finally predicts visual relationships in images.Through experiments,the model outperforms existing visual relationship detection models using external commonsense information.(3)Construction and display system of Graphic-Text Commonsense Base.Aiming at the problem of commonsense representation in pictures and texts,combined with two perspectives of visual relationship data set(Visual Relationship Detection,VRD,etc.)and text commonsense base(ConceptNet),it supplements the lack of commonsense knowledge in the objects,attributes and relationships,forming a Graphic-Text Commonsense Base of images and texts based on commonsense relations between image and text modalities.Each piece of commonsense data in the Graphic-Text Commonsense Base is represented as a seven-tuple,including entity(or subject,object)labels,commonsense relations,the position of the entity in the image,image information,and image description.At the same time In order to facilitate users to view and use the knowledge in the Graphic-Text Commonsense Base,this paper designs and implements Graphic-Text Commonsense Base display system.The functions of the system include the graph structure display of Graphic-Text Commonsense Base,the graph structure display of commonsense in each image,precise query based on relationships and entities,statistics of entities and relationships,and online download functions. |