Font Size: a A A

Automatically Generating Image Captioning Research

Posted on:2018-05-25Degree:MasterType:Thesis
Country:ChinaCandidate:Y F ShenFull Text:PDF
GTID:2348330533961374Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Automatically describe an image using sentences-level captions has become a hot topic in recent years.Especially the depth study of deep learning greatly promoted the development of image captioning.Among these methods,the Long-Short Term Memory(LSTM)network is mostly used,it can not only restore long term memory and short term memory,but also can solve the problem of vanishing gradient and exploding gradient.Though some relate research has grained great performance in the field of image describing,some problems still exist and need to be improved: ? During training,how to train according to image describing in two direction,and learn more richer context information of image description.? During sampling,how to avoid only taking the predict value at time t-1 as the input value at time t,reduce cumulative error and avoid to lead wrong decisions finally.? How to generate quality text description with better model.To solve problems existing in the field of image describing,the paper presents a method of automatically generating image describing based bi-directional Long-Short Term Memory and scheduled sampling(BLSTM-S).The following is the main content: ?We propose a bi-directional Long-Short Term Memory network based on scheduled sampling.As we all know,when we do the work of choosing right words filling in the blank during English examination,the suitable word in the blank is not only relate to forward context of the sentence,but also the backward context of the sentence.So,Comparing to Long-Short Term Memory(LSTM),the BLSTM-S can not only learn the forward information of image describing but also the backward information of image describing,and learn to generate the description of image better.?Use the method of scheduled sampling to sample words.Comparing the previous method of each input of the model is the true value during training at every time,during scheduled sampling,we use a strategy of flipping a coin to randomly decide to use the true previous token with probability ?,or an estimate coming from the model itself with probability(1-?).It can avoid inconsistent between training stage and inference stage and avoid leading to cumulative bad decisions.?In order to get better results,during testing,we use beam search to search the max probability one among k candidates as the output value at every moment.Finally,to verity the validity and accuracy of the BLSTM-S model,in this paper,we conduct a mount of experiment in Flickr8 k data set,Flickr30 k data set and MSCOCO data set.The results showed that our methods outperform related methods on Flickr8 k,Flickr30k and MSCOCO data sets.
Keywords/Search Tags:image describing, bi-directional Long-Short Term Memory, convolution neural network, scheduled sampling, stochastic gradient descent
PDF Full Text Request
Related items