Font Size: a A A

Research On Image Caption Generation Method Based On Deep Learning

Posted on:2021-04-20Degree:MasterType:Thesis
Country:ChinaCandidate:J ZhangFull Text:PDF
GTID:2428330602997044Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The rapid progress of society has greatly stimulated the innovative development of intelligent technology.Both image feature extraction technology and automatic text generation technology have received great attention from the academic community.In recent years,interdisciplinary and interdisciplinary research has become more popular,especially deep learning research that fuses images and text,that is,image caption generation technology.It is a comprehensive research question involving natural language processing and computer vision.This paper introduces the research background of image caption generation technology and the research status at home and abroad in detail.The model of image caption generation technology is analyzed and researched from multiple angles and directions.The specific research content is as follows:(1)Aiming at the problem that the image caption is broad and non-targeted,to enhance the image caption of specific areas,a fine-grained image caption method based on multi-level attention is designed.Firstly,the visual attention mechanism is used to fuse global and local fine-grained features.The joint attention mechanism is then used to fuse the visual and label features of the image to generate a text description for a specific area of the image.Finally,we use attention-based long-term short-term memory network(LSTM),a language generation model,to generate fine-grained image titles.Experiments show that this method can effectively improve the pertinence and accuracy of image caption.(2)Aiming at the problem that the image caption model ignores text information,an image caption method that incorporates multi-angle and multi-modality is proposed.The model first uses the features of global and local images as input,and uses the basic encoding-decoding model to generate the first sentence description of the picture.The generated first sentence is then input to the sentence encoder network to generate a text semantic feature vector.Then,the attention mechanism is used to fuse the two different features of the image feature and the generated semantic feature vector,and input to the attention-based language generation model to generate the next sentence.And so on.Experiments show that this method can effectively generate multi-angle image caption sentences.(3)Aiming at the problems of limited speed and high time cost of image captiongeneration based on stand-alone mode,an image caption generation method based on Hadoop big data platform was designed.The design of the model is based on a model mechanism that combines multi-angle and multi-modal image caption generation methods.It mainly uses a distributed computing framework based on Map Reduce and a distributed storage system based on HDFS.Experiments show that this method can quickly train the image caption generation model,greatly reduce the time cost,and improve the efficiency and performance of the model.
Keywords/Search Tags:Image caption, feature extraction, text generation, neural network, computer vision
PDF Full Text Request
Related items