Research On Image Caption Model Of Multi-target Language

Posted on:2021-04-17

Degree:Master

Type:Thesis

Country:China

Candidate:Y L Zhang

Full Text:PDF

GTID:2428330611499757

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

The image caption task is a cross topic of natural language processing and computer vision and attracted the interest of many scholars.With the rise of artificial intelligence technology,many effective image caption models have been proposed.However,most models use Long Short-term Memory Networks(LSTMs)as generators,and LSTMs have the disadvantage of not being able to support long sequence dependencies well,which has become a performance bottleneck for LSTMbased image caption models,resulting existing models do not learn contextual information in longer sentences well.The current image caption model can only be generated for one target language,and in many application scenarios,texts in different languages are required.As a widely used technology,image caption should not be limited by language.This subject has studied the above issues.Aiming at the problem that the current model has a poor ability to support longer sequences,this paper studies the principle of the image caption model and the current advanced machine translation model,and proposes an image caption model based on the machine translation model.This model uses the existing encoder and decoder structures in the machine translation model,and incorporates pre-trained convolutional neural networks and some network structures that solve specific problems,and can better learn context-dependent information in longer sentences.This paper verifies the validity of the model by comparing the performance of the model on data sets with different sentence length distributions.Experimental results show that the proposed model outperforms LSTM-based image description models in datasets with longer sentences and more data sets.From the perspective of multi-task learning and machine translation models,this topic studies how to solve the language limitation problem of the current model.Based on the parameter hard sharing mode in multi-task learning,a multi-target language image caption model is proposed.The model has multiple decoders capable of generating specific target languages.The experimental results on Chinese and English datasets show that the proposed multi-target language image caption model can generate text captions in multiple languages and achieve certain performance improvements compared with the existing single-target language image caption models.In order to further improve the performance of the model,this paper studies the optimization methods of existing image caption models.This paper optimizes the model in the process of extracting image features and generating descriptions,respectively.The performance of the model in different data sets and different optimization parameters is used to verify the effectiveness of the method.The experimental results show that the English image description generated by the optimized multi-target language image caption model can reach a better level of current research.

Keywords/Search Tags:

image caption, deep learning, lstm, transformer, multi-task learning

PDF Full Text Request

Related items

1	Image Caption Method Based On Deep Learning
2	Research On Image Caption Method Based On Deep Learning
3	Image Caption Model Based On Deep Reinforcement Learning
4	Research On Image Caption Model Based On Deep Learning
5	Research And Implementation On Natural Scene Image Caption Based On Deep Learning
6	Research And Implementation Of Key Technologies Of Image Caption Based On Deep Learning
7	Research On Image Semantic Fine-grained Caption Method Based On Deep Learning
8	Research Of Image Caption Based On CNN-RNN
9	Research Of Image Automatically Caption Algorithm Based On Deep Learning
10	The Research And Implementation Of Image Caption System Based On Deep Learning