| As one of the carriers of information,the image has more intuitive and clear visual information than text.The rich visual information in semantic extraction and representation in images will benefit downstream tasks of vision.Scene image generation mainly extracts the triplet form of the entity and the relationship between entities in a picture through the object detection module,namely subject,relationship,and object,and constructs a semantic structured representation.However,due to the long tail distribution of the relationship between the entities of the training data set in the visual task,the head relationship categories are often coarse-grained descriptions,and there are not enough training samples compared with the tail relationship categories with rich information and practical significance,resulting in the learned scene map not having practical significance.At the same time,due to the imbalance of relationship semantic space caused by the long tail effect,the model tend to ignore the contextual context and prefer to high-frequency head relationships when predicting semantically similar but essentially different relationships,resulting in fuzzy relationship prediction.Therefore,to solve the long tail problem of scene graph generation and effectively distinguish and predict contextualized relationship categories to generate a scene graph with rich meaning and accuracy,this paper proposes main works as follows:(1)An unbiased scene graph generation algorithm based on adaptive regularization is proposed.The algorithm adaptively adjusts the weight of the fully connected classifier using the prior relationship frequency information of the training samples in the training process of each small batch of entities to avoid the over-fitting phenomenon of the fully connected classifier of the model,thus alleviating the impact of the long tail problem of the scene map generation.The proposed algorithm is tested and analyzed on the Visual Genome dataset,which verifies the effectiveness of the algorithm based on the advanced unbiased scene graph generation method and improves the performance of the unbiased scene graph generation method more effectively.(2)An unbiased scene graph generation method based on hypergraph attention is proposed.This method uses a local attention module to capture detailed information about relationships between entities.Due to the importance of contextual information,this method uses hypergraph convolutional neural networks to learn higher-order global information between entities.Aiming at the problem of relationship prediction bias caused by semantic space imbalance,this method achieves balanced prediction by adjusting the loss weights of different relationship categories with heavy weighting.The chapter will experiment with the proposed method on a scene graph Visual Genome dataset.Experimental results show that this method can not only effectively distinguish relationships with similar semantics,but also accurately predict contextual relationships.To sum up,the methods and models of scene graph generation proposed in the paper demonstrates their feasibility on the Visual Genome dataset,provides new ideas and methods to solve the problems caused by the long tail effect in scene graph generation methods and achieves real theoretical significance and practical application value. |