Mask-RCNN Based Image Chinese Caption Generator

Posted on:2019-11-03

Degree:Master

Type:Thesis

Country:China

Candidate:B Liu

Full Text:PDF

GTID:2518306473453754

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

The image caption task is a combination of computer vision and natural language processing and machine learning.It is a hot issue and a difficult issue in these years.Although researchers have done a lot of research work on image caption tasks in recent years,and promoted the development of the task,the overall effect is still unsatisfactory.Errors in image recognition can lead to the accumulation of errors when generating sentences,and the images are completely irrelevant.The situation still exists because machine perception is still in its infancy.With the advancement and development of technology,it is hoped that the image caption task can solve more realistic problems in the future and help humans accomplish more meaningful work.This paper proposes the feature map obtained by combining the latest MASK-RCNN model as the input reference feature of the language generation model.The pre-trained image classification model uses a better Res Net-101,the feature part extracts the feature map after the Ro IAlign layer,and adds Location and area characteristics,this part can be seen as BOTTOM-UP-ATTENTION.The caption model uses an attention mechanism.Through the improvement of the LSTM part of the attention mechanism model,the SC-LSTM model of the natural language generation task is referenced,and a DA vector is added as a semantic control input to guide the description and image generation of the model.For stronger correlation,the DA vector is also obtained through the target detection model.The loss function uses the sum of the cross-entropy loss and the DA vector correlation term,and the optimization method uses a stochastic gradient descent algorithm.Finally,good results were obtained on the AIC Chinese data set.BLEU-4 obtained 0.575,METEOR obtained 0.421,CIDEr obtained 1.882,and ROUGE＿L scored 0.7.In order to obtain more user feedback,the We Chat applet provides a call interface that can simultaneously capture image descriptions and target detection results.

Keywords/Search Tags:

Image Caption, Object Detection, NLG, Attention

PDF Full Text Request

Related items

1	Mask-RCNN Based Image Chinese Caption Generator
2	Research On Image Caption Based On Object-Attention Model
3	Ultrasound Image Caption Based On Object Detection
4	Research On Image Caption Generation Method Based On Deep Learning
5	Image Caption Method Based On Deep Learning
6	Image Caption Generation With Region Based Attention Scheme
7	Image Caption Research Based On Significant Attention
8	Research On Image Caption Method Based On Attention Mechanism
9	Research On Image Caption Algorithm Based On Attention Mechanism
10	Image Caption Based On Multimodal Attention