Font Size: a A A

Research On Image Caption Algorithm Based On Graph Convolution Network

Posted on:2022-01-26Degree:MasterType:Thesis
Country:ChinaCandidate:J J LiuFull Text:PDF
GTID:2518306512471894Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
Image caption is a typical task in the field of cross-modal data,it has been widely applied and received much attention with the development of deep learning.The encoder-decoder based network is currently the mainstream image caption method.The encoder utilizes the graph convolutional network(GNN)to learn the features of the scene graph and uses the Bottom-Up feature instead of the entire feature to establish the mapping relationship between image and text.Complex features and extractors bring a great diversity of information,but they also bring difficulties to mapping.Therefore,this paper studies GNN based image caption,and improves the performance of the network by fusing multiple features and obtaining the potential correlation with the features of the description sentence.Aiming at redundant information brought about by the use of scene graph and the problem of accurate correspondence between the image features of the Bottom-Up candidate area and the description sentence,this paper uses GNN to learn the features of the scene graph,changes the feature weights between objects during learning,and different objects have different importance of the caption.On this basis,enhancing the image features of the Bottom-Up candidate areas,and the areas with different degrees of importance are input into the Top-Down LSTM decoder to make it correspond to each word described in the text,this enhances the main object features in the image.The experimental results show that enhancing the main object features can effectively improve the performance of the network.In order to describe the main objects in the generated sentences more detailed,this paper proposes a feature fusion method with salient target features.First,utilizing the PoolNet saliency target detection network to extract the saliency target area,then using the ResNet-101 network to extract the saliency target feature,finally combining the image feature and the saliency target feature to learn the complementary relationship between them.The feature fusion method combines the salient target features more comprehensively that uses a variety of information and effectively combines the advantages of different features,it enriches the description details of the main objects.The decomposition experiments and comparative experiments on the standard test dataset verify the effectiveness of the algorithm in this paper.The experimental results show that the algorithm in this paper has obvious effects in grasping the relationship between the main objects and the objects,enriching the description details of the main objects,and this paper improved the performance of the model.
Keywords/Search Tags:Image caption, Graph convolutional networks, Scene graph, Attention mechanism, Salient target feature
PDF Full Text Request
Related items