Research On Key Technologies Of Image Caption Based On Multimodal Feature Understanding

Posted on:2021-02-06

Degree:Master

Type:Thesis

Country:China

Candidate:W X Wei

Full Text:PDF

GTID:2428330620464204

Subject:Engineering

Abstract/Summary:

Natural language processing and computer vision are two hot research areas.Image caption is a cross field which combines natural language processing and computer vision.Image caption is to describe the image with a paragraph of text.Image caption has a wide application in image recognition,intelligent human-computer interaction and other fields,especially in assisting crime scene image analysis,which is of great significance and can bring great convenience to human life.In recent years,the encoder-decoder model is particularly popular in image caption.This method has achieved good results,but it still faces many problems.In this paper,based on the encoder-decoder model.The main tasks are as follows:(1)An image contrast model based on complex network is proposed.Complex channel network is similar to siamese network,which can compare the similarity of a group of pictures.The complex triple network can represent the complex value features of the image.Compared with the real value features,the complex value features have better spaciality,which can make the image features more accurate.The effective and accurate features are the basis of making the image caption more accurate.In addition,two kinds of complex networks are tested to verify their effect.(2)This paper proposes an encoder model which integrates multimodal information to improve the effect of the encoder.By combining the scene graph and the graph neural network,the encoder integrates the relevant scene graph information and the language information of the caption text of the picture,and makes more detailed use of the picture and the standard annotation text.The information makes the encoder have a stronger understanding ability and better effect,and carries out relevant experiments to verify the overall performance of the model.(3)Based on the image caption model,this paper designs and implements an image caption web system based on B / S architecture.The system consists of three parts: UI,data processing part and model.The system interface is simple and easy to use,and can provide accurate image caption services.

Keywords/Search Tags:

image caption, neural network, encoder-decoder, image caption system

Related items

1	Research Of Image Caption Based On Encoder-Decoder
2	Image Caption Model Based On Feature Extraction Via Dense Convolutional Neural Network
3	Image Caption Technology Based On Deep Semantic Information
4	Research On Image Caption Generation Method Based On Deep Learning
5	Research And Application On Topic-specific Image Caption Generation Technique
6	Research On Image Caption Based On Attention Mechanism
7	Research And System Implementation Of Image Caption Based On Deep Learning
8	Research On Image Caption Algorithm Based On Attention Mechanism
9	Image Caption Method Based On Deep Learning
10	Research On Image Semantic Caption Generation Based On Encoder-Decoder Framework