Font Size: a A A

Mask-RCNN Based Image Chinese Caption Generator

Posted on:2019-11-03Degree:MasterType:Thesis
Country:ChinaCandidate:B LiuFull Text:PDF
GTID:2518306473453754Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The image caption task is a combination of computer vision and natural language processing and machine learning.It is a hot issue and a difficult issue in these years.Although researchers have done a lot of research work on image caption tasks in recent years,and promoted the development of the task,the overall effect is still unsatisfactory.Errors in image recognition can lead to the accumulation of errors when generating sentences,and the images are completely irrelevant.The situation still exists because machine perception is still in its infancy.With the advancement and development of technology,it is hoped that the image caption task can solve more realistic problems in the future and help humans accomplish more meaningful work.This paper proposes the feature map obtained by combining the latest MASK-RCNN model as the input reference feature of the language generation model.The pre-trained image classification model uses a better Res Net-101,the feature part extracts the feature map after the Ro IAlign layer,and adds Location and area characteristics,this part can be seen as BOTTOM-UP-ATTENTION.The caption model uses an attention mechanism.Through the improvement of the LSTM part of the attention mechanism model,the SC-LSTM model of the natural language generation task is referenced,and a DA vector is added as a semantic control input to guide the description and image generation of the model.For stronger correlation,the DA vector is also obtained through the target detection model.The loss function uses the sum of the cross-entropy loss and the DA vector correlation term,and the optimization method uses a stochastic gradient descent algorithm.Finally,good results were obtained on the AIC Chinese data set.BLEU-4 obtained 0.575,METEOR obtained 0.421,CIDEr obtained 1.882,and ROUGE?L scored 0.7.In order to obtain more user feedback,the We Chat applet provides a call interface that can simultaneously capture image descriptions and target detection results.
Keywords/Search Tags:Image Caption, Object Detection, NLG, Attention
PDF Full Text Request
Related items