Font Size: a A A

Research On Multi-task Learning Based Image Captioning Algorithm

Posted on:2021-05-25Degree:MasterType:Thesis
Country:ChinaCandidate:S CaoFull Text:PDF
GTID:2428330647458913Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Image captioning is a task for automatically generating a corresponding text description for the image by machine,which belongs to the cross-field of computer vision and natural language processing.Image captioning is a multimodal task which processing both image and text modal object.Compared to a single modal task,the image captioning task is closer to the actual application scenario and widely used in various fields.In order to generate better image captioning text,this thesis utilize multi-domain knowledge to make the machine become smarter.The machine shows more wisdom in multiple fields by adding auxiliary tasks.In this thesis,I focus on several key problems of image captioning task,such as insufficient information extraction of key areas of the image,insufficient feature utilization during the modal conversion from image to text,and inconsistent model training and evaluation.The main research contents of this thesis are as follows:(1)In the image feature extracting part,the image multi-label classification task is added as an auxiliary task.The pre-trained multi-label classifier contains rich image representation information about category recognition,which makes the extracted features become more biased to key areas in the image.In addition,combining label embedding with image features as input for the next part can enrich the extracted information and optimize the problem of insufficient information extraction.(2)In the generating image captioning part,the language model is utilized as an auxiliary task.In our thesis,the labels and captions are combined as pre-trained data for the language model.The language model generates captions based on the labels.The pre-trained language model contains rich semantic information to assist the main task in extracting semantic information from the image features more fully and then the final text captions are generated.(3)In view of the two problems that exposure bias often occur in image captioning task,and the mismatch between training and evaluation damages the model performance even if the loss value drops,I propose a method based on reinforcement learning to train image captioning model,and use the evaluation method to calculate the reward function.In this thesis,the fundamental architecture of the algorithm is based on theencoder-decoder framework,in which the encoder part extracts image features and the decoder part implements the generation of image descriptions.The experimental results show that adding the auxiliary tasks plays an important role in improving the model's performance and the training method based on reinforcement learning has a good effect on solving the above issues.
Keywords/Search Tags:Image captioning, multi-task learning, auxiliary task, reinforcement learning
PDF Full Text Request
Related items