Font Size: a A A

Research On Social Image Captioning Based On Deep Learning

Posted on:2020-10-26Degree:MasterType:Thesis
Country:ChinaCandidate:X L ChuFull Text:PDF
GTID:2518306500483324Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rise of big data and deep learning,image captioning has become a hot research direction in the field of artificial intelligence.Although the traditional method could complete the basic description task,it still has some disadvantages in accuracy and richness.they only use the attributes or the visual features of the image.In addition,the correlation between them and the complementarity between different modal features are not studied in detail.Therefore,this thesis proposes two improved methods of image captioning.Firstly,it is the social image captioning based on visual attention and user attention.This method is used to correct the deviation of users' personalized tags on visual features in social images.What's more,users' personalized tags can further enrich and improve the sentences described by their own different vocabularies.Secondly,it is the multimodal fusion image captioning based on hierarchical attention.This method considers the complementarity between different features of the image.The image features are divided into three modes,in other words,the tag representing the high-level semantic information of the image,the image features,which extracted from the full-connected layer in CNN,representing the middle-level semantic information of the images,and the visual features which extracted from the last convolution layer in CNN,representing the low-level information of the image.In order to integrate the features of different modes efficiently,this thesis deals with the features of the three different modes at different levels of attention.The model could dynamically use different levels of attention to adjust the features of different modes to further improve the accuracy and richness of image captioning.In order to verify the effectiveness of the models,they have been tested on the MS COCO dataset in this thesis.Compared with the traditional model,the models which proposed in this thesis have been improved both in terms of evaluation indexes and description effect.In addition,the model which proposed in this thesis further enriches the image captioning content and makes it more personalized.
Keywords/Search Tags:image captioning, tags, user attention, visual attention, multi-modal fusion
PDF Full Text Request
Related items