Font Size: a A A

Research On Image Caption Model Of Multi-target Language

Posted on:2021-04-17Degree:MasterType:Thesis
Country:ChinaCandidate:Y L ZhangFull Text:PDF
GTID:2428330611499757Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The image caption task is a cross topic of natural language processing and computer vision and attracted the interest of many scholars.With the rise of artificial intelligence technology,many effective image caption models have been proposed.However,most models use Long Short-term Memory Networks(LSTMs)as generators,and LSTMs have the disadvantage of not being able to support long sequence dependencies well,which has become a performance bottleneck for LSTMbased image caption models,resulting existing models do not learn contextual information in longer sentences well.The current image caption model can only be generated for one target language,and in many application scenarios,texts in different languages are required.As a widely used technology,image caption should not be limited by language.This subject has studied the above issues.Aiming at the problem that the current model has a poor ability to support longer sequences,this paper studies the principle of the image caption model and the current advanced machine translation model,and proposes an image caption model based on the machine translation model.This model uses the existing encoder and decoder structures in the machine translation model,and incorporates pre-trained convolutional neural networks and some network structures that solve specific problems,and can better learn context-dependent information in longer sentences.This paper verifies the validity of the model by comparing the performance of the model on data sets with different sentence length distributions.Experimental results show that the proposed model outperforms LSTM-based image description models in datasets with longer sentences and more data sets.From the perspective of multi-task learning and machine translation models,this topic studies how to solve the language limitation problem of the current model.Based on the parameter hard sharing mode in multi-task learning,a multi-target language image caption model is proposed.The model has multiple decoders capable of generating specific target languages.The experimental results on Chinese and English datasets show that the proposed multi-target language image caption model can generate text captions in multiple languages and achieve certain performance improvements compared with the existing single-target language image caption models.In order to further improve the performance of the model,this paper studies the optimization methods of existing image caption models.This paper optimizes the model in the process of extracting image features and generating descriptions,respectively.The performance of the model in different data sets and different optimization parameters is used to verify the effectiveness of the method.The experimental results show that the English image description generated by the optimized multi-target language image caption model can reach a better level of current research.
Keywords/Search Tags:image caption, deep learning, lstm, transformer, multi-task learning
PDF Full Text Request
Related items