Font Size: a A A

Research On Image Captioning Based On Adversarial Learning

Posted on:2021-05-27Degree:MasterType:Thesis
Country:ChinaCandidate:H J DuFull Text:PDF
GTID:2428330614960351Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Image captioning involves image and natural language processing method.According to the complexity of content of image and natural language,we required a network which has great modeling ability.In recent year,the Internet and Big Data has made great progress,it was a great success in many fields for neural network by its strong data fitting ability.Against this background,applying neural network technology to the field of image captioning has become a mainstream method.Meanwhile,optimizing the structure of neural network to obtain higher quality captions of images has become a hot research subject.In previous method,image processing is the key point.Further,the researcher focus on getting better image feature.High quality image feature contains accurate object information which improves caption quality effectively.However,the enhancement of image feature can only promote correlation between text and image.Namely,the words corresponding to the main content of image will be generated probably.But it is lacking in majorization of text and the generated captions does not meet the standard of natural language.In one way,for the insufficient of accuracy and consistency in the process of text generation,we propose an image caption optimization method based on long-short time interval.The method uses deep neural network to extract image feature.The key information of image is represented by matrix and combined with ground truth as the input of LSTM.In the process of caption generation,the long-time interval optimization module and short-time interval optimization module are used to improved quality of captions.The long-term interval optimization module is composed of a long-term interval optimizer and discriminator.In training,long-time interval optimizer improves the semantic relevance between image and text in the manner of adversarial training with discriminator.For short-time interval,it optimizes the generated caption in the way of supervised learning.The phrases and words used in the generated caption are constrained in order to get more accurate and coherent texts.The experiment results show that the method proposed in this paper is effective and the scores of several evaluation indexes have been improved by our model.In another way,according to human usage of natural language in daily life,image captions should have diversity.However,there is a lack of optimization of it.Therefore,we propose an image captions diversity optimization method based on adversarial training.First of all,the method uses caption generation module to obtain multiple batches of text and calculate the difference between multiple batches of text corresponding to the same image,that is to say as the differences of intergroup.Increasing the variety of generated caption by expanding the differences of intergroup.Second,in the light of structure of adversarial network,the differences of between-group are considered in the discriminator to guide generator.The experiment results show that our method is effective to improve the caption diversity.
Keywords/Search Tags:long-short time interval, adversarial training, the differences of intergroup, the differences of between-group
PDF Full Text Request
Related items