Research On Image Captioning Based On Adversarial Learning

Posted on:2021-05-27

Degree:Master

Type:Thesis

Country:China

Candidate:H J Du

Full Text:PDF

GTID:2428330614960351

Subject:Signal and Information Processing

Abstract/Summary:

PDF Full Text Request

Image captioning involves image and natural language processing method.According to the complexity of content of image and natural language,we required a network which has great modeling ability.In recent year,the Internet and Big Data has made great progress,it was a great success in many fields for neural network by its strong data fitting ability.Against this background,applying neural network technology to the field of image captioning has become a mainstream method.Meanwhile,optimizing the structure of neural network to obtain higher quality captions of images has become a hot research subject.In previous method,image processing is the key point.Further,the researcher focus on getting better image feature.High quality image feature contains accurate object information which improves caption quality effectively.However,the enhancement of image feature can only promote correlation between text and image.Namely,the words corresponding to the main content of image will be generated probably.But it is lacking in majorization of text and the generated captions does not meet the standard of natural language.In one way,for the insufficient of accuracy and consistency in the process of text generation,we propose an image caption optimization method based on long-short time interval.The method uses deep neural network to extract image feature.The key information of image is represented by matrix and combined with ground truth as the input of LSTM.In the process of caption generation,the long-time interval optimization module and short-time interval optimization module are used to improved quality of captions.The long-term interval optimization module is composed of a long-term interval optimizer and discriminator.In training,long-time interval optimizer improves the semantic relevance between image and text in the manner of adversarial training with discriminator.For short-time interval,it optimizes the generated caption in the way of supervised learning.The phrases and words used in the generated caption are constrained in order to get more accurate and coherent texts.The experiment results show that the method proposed in this paper is effective and the scores of several evaluation indexes have been improved by our model.In another way,according to human usage of natural language in daily life,image captions should have diversity.However,there is a lack of optimization of it.Therefore,we propose an image captions diversity optimization method based on adversarial training.First of all,the method uses caption generation module to obtain multiple batches of text and calculate the difference between multiple batches of text corresponding to the same image,that is to say as the differences of intergroup.Increasing the variety of generated caption by expanding the differences of intergroup.Second,in the light of structure of adversarial network,the differences of between-group are considered in the discriminator to guide generator.The experiment results show that our method is effective to improve the caption diversity.

Keywords/Search Tags:

long-short time interval, adversarial training, the differences of intergroup, the differences of between-group

PDF Full Text Request

Related items

1	The Analysis Of Differences Between Chinese And American Television Talkshow Host
2	Sex differences in virtual navigation influenced by visual factors and individual differences
3	Course Video Recommendation Algorithm Based On Differences In User's Long And Short Interest In Educational Scenes
4	A Study On The Cognitive Differences Of Audience 's Image Of Shanghai
5	From The "Read Newspapers Every Day" And "Ma Bin Read Newspapers" To Analyse The Similarities And Differences In Hong Kong And The Mainland Newspaper-reading-program On TV
6	A Study On Differences In Borrowing Behavior Among Primary And Secondary School Students
7	Massive search for detecting group differences
8	Edge Detection Algorithm Based On The Direction Of The Differences Between Multi-scale Analysis Of Digital Images
9	Research On The Factors Affecting Social Media Fatigue Based On The Perspective Of Intergenerational Differences
10	Microblog User Behaviors Analysing And Cultural Differences Mining