Research On Multi-task Learning Based Image Captioning Algorithm

Posted on:2021-05-25

Degree:Master

Type:Thesis

Country:China

Candidate:S Cao

Full Text:PDF

GTID:2428330647458913

Subject:Computer Science and Technology

Abstract/Summary:

Image captioning is a task for automatically generating a corresponding text description for the image by machine,which belongs to the cross-field of computer vision and natural language processing.Image captioning is a multimodal task which processing both image and text modal object.Compared to a single modal task,the image captioning task is closer to the actual application scenario and widely used in various fields.In order to generate better image captioning text,this thesis utilize multi-domain knowledge to make the machine become smarter.The machine shows more wisdom in multiple fields by adding auxiliary tasks.In this thesis,I focus on several key problems of image captioning task,such as insufficient information extraction of key areas of the image,insufficient feature utilization during the modal conversion from image to text,and inconsistent model training and evaluation.The main research contents of this thesis are as follows:(1)In the image feature extracting part,the image multi-label classification task is added as an auxiliary task.The pre-trained multi-label classifier contains rich image representation information about category recognition,which makes the extracted features become more biased to key areas in the image.In addition,combining label embedding with image features as input for the next part can enrich the extracted information and optimize the problem of insufficient information extraction.(2)In the generating image captioning part,the language model is utilized as an auxiliary task.In our thesis,the labels and captions are combined as pre-trained data for the language model.The language model generates captions based on the labels.The pre-trained language model contains rich semantic information to assist the main task in extracting semantic information from the image features more fully and then the final text captions are generated.(3)In view of the two problems that exposure bias often occur in image captioning task,and the mismatch between training and evaluation damages the model performance even if the loss value drops,I propose a method based on reinforcement learning to train image captioning model,and use the evaluation method to calculate the reward function.In this thesis,the fundamental architecture of the algorithm is based on theencoder-decoder framework,in which the encoder part extracts image features and the decoder part implements the generation of image descriptions.The experimental results show that adding the auxiliary tasks plays an important role in improving the model's performance and the training method based on reinforcement learning has a good effect on solving the above issues.

Keywords/Search Tags:

Image captioning, multi-task learning, auxiliary task, reinforcement learning

Related items

1	Automatic Auido Captioning Based On Reinforcement Learning
2	Research On Machine Learning Methods And Their Applications For Multi-task Scenarios
3	Application Of Multi-Task Based Audio Feature Extraction In Audio Captioning System
4	Research On Multi-agent Reinforcement Learning Algorithms Based On Self-Supervised Learning
5	Multi-Agent Autonomous Task Assignment With Reinforcement Learning
6	Dynamic Task Scheduling Algorithm And Platform Based On Reinforcement Learning
7	Research On Multi-Robot Task Assignment Method Based On Reinforcement Learning
8	Research On Deep Reinforcement Learning Algorithm Based On The Combination Of Intrinsic Reward And Auxiliary Tasks
9	Research On Multi-Task Reinforcement Learning Based On Parallel Training
10	Image Captioning Based On Deep Learning And Multi-Metric Reinforcement Learning