Font Size: a A A

Research On Multimodal Leaking For Graphic Travelogue Generation

Posted on:2019-07-24Degree:MasterType:Thesis
Country:ChinaCandidate:J Q FuFull Text:PDF
GTID:2348330545458470Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Chinese tourism market have achieved great progress in recent years.With the development of dat a sharing,the concept of wisdom tour-ism is proposed.It become a major concern of induatry to make online travel sites more perfect with artificial intelligence.At present,most of the online travel website will provide users with colorful graphic travelogue.However,there are some problems with the travelogue of online travel website.First of all,for some online travelsites,there are a large number of tourism photo album whithout the travel notes.Second,quite a number of travel notes are short of visualization image.In the end,there are redun-dant image and travelogue for most of the same travel scenic spots.In view of the above problems of online travel sites,we carried on the research of the related studies.Compared with many multimodal learning tasks such as visual storytelling and image caption,we put forward the multimodal vis-ual storytelling model of deep learning to solve the problem.On the algorithm,to address potential problems of traditional visual storytelling model,we put forward three kinds of improved model,using the same convolution neural network as the input model of image features,the same language model as the input model of text features,using Bidi-rectional Long Short-Term Memory(BLSTM)network as the modal trans-formation structure,we build long-term memory visual storytelling model.On the basis of long-term memory visual storytelling model,we further introduce attention mechanism to construct attention visual storytelling model,and introduce adversarial training mechanism to construct adver-sarial visual storytelling model.Through extensive experiment on the Chi-nese and English datasets,we verify the validity of the three models,and adversarial visual storytelling model achieved the best results in all models.Introducing attention mechanism and adversary mechanism innovatively,this thesis constructs new learning paradigm for visual storytelling task.It is proved not only in terms of machine translation and image generation these mechanisms is effective,also in multimodal learning visual storytell-ing they have a very big potential.On the Engineering,this thesis,by using three kinds of improved model,oriented on the graphic travelogue writing application,build graphic travelogue generating system,and make a series of optimization on algorithm and the efficiency.The system is divided into offline visual storytelling training system and online travelogue writing system,it can be used to achieve travelogue writing,filtrating travelogue,and so on.
Keywords/Search Tags:multimodal learning, visual storytelling, deep learning
PDF Full Text Request
Related items