Font Size: a A A

Research On Key Technologies Of Automatic Text Generation Based On Images

Posted on:2020-01-21Degree:MasterType:Thesis
Country:ChinaCandidate:L B MoFull Text:PDF
GTID:2428330575456459Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
In recent years,artificial intelligence has developed rapidly,and research in the intersection of computer vision and natural language processing has gradually attracted the interest of researchers.Most existing work focuses on the image captioning task,which is aimed to generate a single sentence description from a single image.This paper expands the dimensions of the input and output,and deals with generating paragraph descriptions based on image sequences,which is denoted as the visual storytelling task.Compared with image captioning,visual storytelling poses a greater challenge to the intersection of CV and NLP.It not only requires understanding each image in the visual sequence and the context between the images,but also guaranteeing the coherence of generated paragraphs.First of all,this paper explores the visual storytelling algorithm based on deep learning.Aiming at the shortcomings of current visual storytelling task in image stream modeling and text generation,the first Chinese dataset of this task is constructed,and a retrieval architecture RST-Att based on multi-modal space mapping is proposed.On the one hand,RST-Att model builds a bidirectional long short-term memory network,and incorporates the Attention Mechanism to improve the modeling ability of image streams in different scenes.On the other hand,this model introduces the Rhetorical Structure Theory in the field of linguistics to improve the coherence of generated texts.In the part of experiments,this paper uses both Chinese and English datasets.The results show that RST-Att achieves better performance than the baseline models.Furthermore,faced with the same task,unlike the retrieval method,this paper explores the generative method and proposes an adversarial learning network AAL.As a generative method,AAL constructs a reward model instead of the learning principle of maximum likelihood estimation,and generates rewards to optimize the model.In addition,this paper proposes a new granularity of text generation,that is,generating the paragraph at the level of sense group to improve the coherence of the generated text.In the experimental part,this paper designes several comparison experiments,better results are abtained by the AAL model in both the automatic evaluation metrics and human evalution than the baseline models.Finally,this paper uses the visual storytelling algorithm proposed above to comprehensively develop a travel notes generation system,to automatically generate relevant travel notes based on the photo streams uploaded by users.The system mainly includes a data acquisition module,a travel notes generation module,a background manage ment and front-end display module.
Keywords/Search Tags:visual storytelling, neural network, attention mechanism, adversarial learning, sense group
PDF Full Text Request
Related items