Research On Social Image Captioning Based On Deep Learning

Posted on:2020-10-26

Degree:Master

Type:Thesis

Country:China

Candidate:X L Chu

Full Text:PDF

GTID:2518306500483324

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

With the rise of big data and deep learning,image captioning has become a hot research direction in the field of artificial intelligence.Although the traditional method could complete the basic description task,it still has some disadvantages in accuracy and richness.they only use the attributes or the visual features of the image.In addition,the correlation between them and the complementarity between different modal features are not studied in detail.Therefore,this thesis proposes two improved methods of image captioning.Firstly,it is the social image captioning based on visual attention and user attention.This method is used to correct the deviation of users’ personalized tags on visual features in social images.What’s more,users’ personalized tags can further enrich and improve the sentences described by their own different vocabularies.Secondly,it is the multimodal fusion image captioning based on hierarchical attention.This method considers the complementarity between different features of the image.The image features are divided into three modes,in other words,the tag representing the high-level semantic information of the image,the image features,which extracted from the full-connected layer in CNN,representing the middle-level semantic information of the images,and the visual features which extracted from the last convolution layer in CNN,representing the low-level information of the image.In order to integrate the features of different modes efficiently,this thesis deals with the features of the three different modes at different levels of attention.The model could dynamically use different levels of attention to adjust the features of different modes to further improve the accuracy and richness of image captioning.In order to verify the effectiveness of the models,they have been tested on the MS COCO dataset in this thesis.Compared with the traditional model,the models which proposed in this thesis have been improved both in terms of evaluation indexes and description effect.In addition,the model which proposed in this thesis further enriches the image captioning content and makes it more personalized.

Keywords/Search Tags:

image captioning, tags, user attention, visual attention, multi-modal fusion

PDF Full Text Request

Related items

1	Research On Image Captioning Algorithm Guided By Attention And Visual Common Sense
2	Research On Image Captioning Method Based On Temporal Collaboration Attention
3	Research On Image Captioning Based On Visual Attention
4	Complex Scene Reasoning Based On Multi-modal Attention Mechanism
5	Video Captioning Algorithms Based On Multi-head Attention Mechanism
6	The Research On Visual Captioning Based On Attention Mechanism
7	Image Captioning Based On Adaptive Visual Attention Mechanism
8	Research On Image Captioning Generation Based On Faster R-CNN And Visual Attention
9	Research On Image Captioning Models Based On Deep Learning
10	Research Of Video Captioning On Egocentric Videos