Font Size: a A A

Research On Key Technologies Of Image Caption Based On Multimodal Feature Understanding

Posted on:2021-02-06Degree:MasterType:Thesis
Country:ChinaCandidate:W X WeiFull Text:PDF
GTID:2428330620464204Subject:Engineering
Abstract/Summary:PDF Full Text Request
Natural language processing and computer vision are two hot research areas.Image caption is a cross field which combines natural language processing and computer vision.Image caption is to describe the image with a paragraph of text.Image caption has a wide application in image recognition,intelligent human-computer interaction and other fields,especially in assisting crime scene image analysis,which is of great significance and can bring great convenience to human life.In recent years,the encoder-decoder model is particularly popular in image caption.This method has achieved good results,but it still faces many problems.In this paper,based on the encoder-decoder model.The main tasks are as follows:(1)An image contrast model based on complex network is proposed.Complex channel network is similar to siamese network,which can compare the similarity of a group of pictures.The complex triple network can represent the complex value features of the image.Compared with the real value features,the complex value features have better spaciality,which can make the image features more accurate.The effective and accurate features are the basis of making the image caption more accurate.In addition,two kinds of complex networks are tested to verify their effect.(2)This paper proposes an encoder model which integrates multimodal information to improve the effect of the encoder.By combining the scene graph and the graph neural network,the encoder integrates the relevant scene graph information and the language information of the caption text of the picture,and makes more detailed use of the picture and the standard annotation text.The information makes the encoder have a stronger understanding ability and better effect,and carries out relevant experiments to verify the overall performance of the model.(3)Based on the image caption model,this paper designs and implements an image caption web system based on B / S architecture.The system consists of three parts: UI,data processing part and model.The system interface is simple and easy to use,and can provide accurate image caption services.
Keywords/Search Tags:image caption, neural network, encoder-decoder, image caption system
PDF Full Text Request
Related items