Font Size: a A A

Research And Application Of Image Captioning Method

Posted on:2021-02-16Degree:MasterType:Thesis
Country:ChinaCandidate:C R LongFull Text:PDF
GTID:2428330614460387Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Image captioning aims to translate an image into a complete and natural sentence.It involves both computer vision and natural language processing.On the one hand,although image captioning has achieved good results under the rapid development of deep neural networks,excessively pursuing the evaluation results of the captioning models makes the generated text description too conservative in practical applications.It is necessary to increase the diversity of the text description and account for prior knowledge such as the user's favorite vocabularies and writing styles.On the other hand,image captioning often requires a large set of training image-sentence pairs.Therefore,how to reduce the dependence on the image-sentence pairs dataset,learn the domain variance between different datasets,and use other available data annotations to train the image captioning model well becomes more and more important.However,in practice,acquiring sufficient training pairs is always expensive,making the recent captioning models limited in their ability to describe objects outside of training corpora(i.e.,novel words).Regarding the problems of personalization,domain variance,and novel words in the task of image captioning,the main works of the dissertation are as follows:(1)This dissertation proposes the image captioning that can generate sentences,using the most preferred word expressions to describe the user's own story and life experience.The proposed method can flexibly model user interests by embedding user IDs as interest vectors.By modeling the unique information of each user,such as image features,user ID,and user content,the user's characteristic interest vector is constructed.The user interest vector combined with the top-down attention mechanism can better guide the training of language models and generate text description sentences that conform to the user's style.The effectiveness of this method has been verified on the data sets of Instagram and Lookbook platforms.(2)This dissertation proposes to use simple and effective domain-invariant constraints to learn cross-domain text description generation models that can be applied to different data platforms.By constructing effective domain constraints with distance measurement as the core for the model,the domain offset between the sentence-level features of the source and target domains can be minimized in the hidden space,and the shared subspace features can be learned.The domain shared dictionary method proposed at the same time aims to enrich sentence generation in different data domains.To further study the private data characteristics of different data domains,this dissertation also proposes to use the domain classifier mechanism to guide the language model to generate text sentences for specific data domains.The experimental results prove the effectiveness of the method.(3)This dissertation presents the application of a language model that incorporates a replication mechanism to food analysis data sets.The model can directly "copy" the appropriate vocabulary in the candidate words generated by the picture(including some novel words that never appeared in the paired picture text data set)to construct the output sentence,so as to realize the description generation of the novel words.Byembedding the replication mechanism in the traditional end-to-end sequence generation model and assisting in the effective target detection model,it helps the language model learn to generate novel words descriptions.The experimental results prove the effectiveness of the method.
Keywords/Search Tags:image captioning, domain adaptation, personalization, novel words
PDF Full Text Request
Related items