Research And Implementation On Natural Scene Image Caption Based On Deep Learning

Posted on:2021-04-11

Degree:Master

Type:Thesis

Country:China

Candidate:K Y Ma

Full Text:PDF

GTID:2428330614458606

Subject:Integrated circuit engineering

Abstract/Summary:

In recent years,with the increasing number of data to be processed and computer's breakthroughs in hardware computing capabilities,deep learning technology has flourished on this basis.Image caption is a mixed field involving computer vision and natural language processing,the value of his research is reflected in the assisted life of visually impaired persons,the conversion of pictures and texts,the automatic generation of titles,and the intelligence of machines.The traditional template-based and retrieval-based methods have the characteristics of description error,monotonous description,and poor robustness.Combining the encoding-decoding structure in machine translation and the use of deep neural networks,the task of Image caption has been greatly improved compared to traditional methods.However,there are still problems such as insufficient encoding-decoding process,loss of visual information during decoding,insufficient attention to detailed information,and inconsistent model training goals and evaluation standards.In view of the above problems,this thesis studies and explores the Image caption based on the deep learning encoding-decoding structure.The main content and research focus of the work are as follows:1.For the input image visual information is lost during the decoding process or cannot be adjusted dynamically,a guided decoding network is used to connect the encoding and decoding parts,so that the encoded information can guide the decoding at each step,and the decoded information is automatically adjusted at the same time,realizing the end-to-end training process.In order to fully extract and parse the information in the encoding-decoding process,dense convolutional networks(Dense Net)and multiple instance learning(MIL)methods were selected as image encoders,and nested long short term memory networks(NLSTM)were used as decoders.The experiments show that the performance of this model is better than some popular models.2.An attention mechanism was introduced to focus on the details,and a two-layer decoding structure was constructed,which further improved the model in terms of detail description and semantic richness.At the same time,the model structure and optimization methods of deep reinforcement learning methods directly optimize the same set of evaluation indicators to train the model,and solve the problem of inconsistent training and evaluation standards.Finally,the model was trained and tested on Microsoft's MSCOCO and Yahoo's Flickr 30 k datasets,the results show that the model has improved by nearly 0.02,0.03,and 0.08 on the BLEU,METEOR,and CIDEr indicators compared to the current popular models.

Keywords/Search Tags:

deep learning, Image caption, guided decoding, attention mechanism, deep reinforcement learning

Related items

1	Deep Learning-Based Image Caption
2	Research And Implementation Of Key Technologies Of Image Caption Based On Deep Learning
3	Research Of Image Automatically Caption Algorithm Based On Deep Learning
4	Research On Image Caption Based On Deep Learning
5	Research On Image Caption Generation Based On Deep Reinforcement Learning
6	Research On Image Caption Algorithm Based On Deep Learning
7	Image Caption Technology Based On Deep Semantic Information
8	Research On Image Caption Generation Method Based On Deep Learning
9	Research On Image Caption Method Based On Attention Mechanism
10	Image Caption Model Based On Deep Reinforcement Learning