| Image captioning is a task that combines images and texts,and the main purpose is to use computers to automatically generate descriptive sentences corresponding to images.Image captioning has a wide range of applications in the fields of assistance systems for visually impaired people,information retrieval and intelligent transportation.At present,the more mature image captioning models use supervised methods,which are limited by high labor costs.The rise of unsupervised methods provides a new idea for image captioning.This thesis draws on the idea of unsupervised methods and integrates deep learning related technologies to study the image captioning task.The main work is as follows:First,we propose unsupervised image captioning based on generative adversarial methods called GA-based UIC.Considering the limitations of supervised methods in real-world problems,this thesis refers to the idea of unsupervised methods and proposes an unsupervised image captioning model that does not depend on datasets: based on a generative adversarial text generation method,using a recurrent neural network GRU with fewer parameters to constitute the generator and discriminator in the generative adversarial network.And the generative adversarial network is a decoder,it can make the model achieve the purpose of unsupervised,and effectively avoid a large number of data set labeling work;the encoder is a relatively mature convolutional neural network,while the detection model YOLOv3 is used to assist training and effectively improve the accuracy of the model.By comparing the convergence time,training and testing time of different models,it shows that the overall cycle of the model proposed in this paper is shorter,and the scores of the generated descriptions on indicators such as BLEU_1,BLEU_2,BLEU_3,and ROUGE are 2.6%,0.7%,0.6% and 0.4% higher than the UIC model.Second,we propose unsupervised image captioning based on residual structure and attention mechanism called Res-Att UIC.There is still a certain gap between the current unsupervised image captioning model and the more mature supervised image captioning model.The main problems lie in two aspects: the continuous deepening of the convolutional network used to extract image features will bring about problems such as gradient disappearance and signal disappearance,which will affect the model convergence;the unsupervised model cannot pay attention to the important part of the image.To solve these problems,the Res-Att UIC model is improved on the basis of the GA-based UIC model.First,based on the method of solving the gradient disappearance problem in Res Net,the residual structure is introduced into the convolutional neural network in the encoder,which effectively avoids the deep network belt.Secondly,the attention mechanism is integrated into the generative confrontation method of the decoder,so that the model can pay attention to different areas in the image while generating text,effectively avoiding the waste of computing resources,and generating accurate descriptions.Compared with the existing unsupervised image description models,the Res-Att UIC model scores are 3.7%,1.2%,1.3%,1.9% and 1.0% higher than the UIC model in terms of BLEU_1,BLEU_3,BLEU_4 and METEOR,indicating the final description quality There is a certain improvement.In this thesis,we first propose a fast unsupervised image captioning model GAbased UIC based on generative adversarial methods,which realizes unsupervised image captioning without relying on datasets.Then,we propose an unsupervised image captioning model Res-Att UIC,which fuses residual structure and attention mechanism,is proposed to enhance the quality of the generated description sentences. |