Font Size: a A A

Research And Implementation On Natural Scene Image Caption Based On Deep Learning

Posted on:2021-04-11Degree:MasterType:Thesis
Country:ChinaCandidate:K Y MaFull Text:PDF
GTID:2428330614458606Subject:Integrated circuit engineering
Abstract/Summary:PDF Full Text Request
In recent years,with the increasing number of data to be processed and computer's breakthroughs in hardware computing capabilities,deep learning technology has flourished on this basis.Image caption is a mixed field involving computer vision and natural language processing,the value of his research is reflected in the assisted life of visually impaired persons,the conversion of pictures and texts,the automatic generation of titles,and the intelligence of machines.The traditional template-based and retrieval-based methods have the characteristics of description error,monotonous description,and poor robustness.Combining the encoding-decoding structure in machine translation and the use of deep neural networks,the task of Image caption has been greatly improved compared to traditional methods.However,there are still problems such as insufficient encoding-decoding process,loss of visual information during decoding,insufficient attention to detailed information,and inconsistent model training goals and evaluation standards.In view of the above problems,this thesis studies and explores the Image caption based on the deep learning encoding-decoding structure.The main content and research focus of the work are as follows:1.For the input image visual information is lost during the decoding process or cannot be adjusted dynamically,a guided decoding network is used to connect the encoding and decoding parts,so that the encoded information can guide the decoding at each step,and the decoded information is automatically adjusted at the same time,realizing the end-to-end training process.In order to fully extract and parse the information in the encoding-decoding process,dense convolutional networks(Dense Net)and multiple instance learning(MIL)methods were selected as image encoders,and nested long short term memory networks(NLSTM)were used as decoders.The experiments show that the performance of this model is better than some popular models.2.An attention mechanism was introduced to focus on the details,and a two-layer decoding structure was constructed,which further improved the model in terms of detail description and semantic richness.At the same time,the model structure and optimization methods of deep reinforcement learning methods directly optimize the same set of evaluation indicators to train the model,and solve the problem of inconsistent training and evaluation standards.Finally,the model was trained and tested on Microsoft's MSCOCO and Yahoo's Flickr 30 k datasets,the results show that the model has improved by nearly 0.02,0.03,and 0.08 on the BLEU,METEOR,and CIDEr indicators compared to the current popular models.
Keywords/Search Tags:deep learning, Image caption, guided decoding, attention mechanism, deep reinforcement learning
PDF Full Text Request
Related items