Font Size: a A A

Automatic Generation Of Content Based On Deep Learning

Posted on:2020-09-30Degree:MasterType:Thesis
Country:ChinaCandidate:Y HanFull Text:PDF
GTID:2428330596468157Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With appearance of every electronic device,web service and application,the amount of data generated in the world is increasing exponentially,which provides a necessary condition for human beings to enter the era of artificial intelligence.Nowadays,the improvement of large data,algorithms and computing power makes it possible for various AI applications to land,such as machine translation,image restoration,automatically adding subtitles to videos and so on.These problems are collectively referred to as content automatic generation.Content automatic generation has broad application prospects in many fields: for an enterprise that specializes in providing web services,after designers design web front-end pages,engineers may write programs to automatically convert front-end pages into corresponding HTML code to avoid such duplication of work,which belongs to the category of image understanding;for advertising companies,in order to provide online advertising posters of various styles and high quality for different customers,the problem of image generation is particularly important.This paper aims at the study of image caption problem as well as image generation problem and these two problems are quite challenging.As for the image caption problem,image information and text information need to be handled simultaneously.So this problem can be thought as an interdiscipline between computer vision and natural language processing.As for the image generation problem,the emphasis is laid on avoiding the problem of mode collapse as well as dealing with the long-range dependency relationship.This paper conducts a series of researches on these two problems and the research contents are as follows:(1)A method based on object detection for image caption problemAiming at the problem of image captioning,a method based on object detection is proposed in this paper.In order to lay emphasis on the overall features and the local features simultaneously in an image,the approach is used as follows: Given an original image information,object detector MRCNN is used to cut out the detected object as object image information.CNN and Bi-LSTM are used to model the problem of image captioning with the data of original image information,object image information and corresponding text information.The experimental results show that Bi-LSTM can make the model consider both front and back information at one time point,and the object detector can help the model detect the details of the page.Compared with the traditional CNN-RNN methods,the model based on object detection further improves the B@2 index and ROUGE index,reaching 80.00 and 78.54 respectively.(2)A method based on fully convolutional operation for image caption problemAiming at the problem of image captioning,a method based on fully convolutional operation is also proposed in this paper.In order to make up for the problem that RNN is not good at dealing with long text sequences,this paper proposes replacing RNN sequential method with fully convolutional operation and using Gated CNN as well as residual layer to model sentences.The experimental results show that using fully convolutional operation instead of ordinary sequential model is beneficial to long text modeling.The introduction of Gated CNN enables the model to control the flow of information at a certain time,selectively forgetting information and absorbing certain input information.Introducing residual layer can make the model deeper and avoid a series of problems such as model degradation.Compared with the traditional CNN-RNN methods,the model based on fully convolutional operation further improves the B@n index and CIDER index,reaching 86.12,80.56,74.02,66.34 and 3.59,respectively.(3)A method based on Transformer-encoder architecture for image generation problemTo solve the problem of image generation,a method based on Tranformerencoder architecture is proposed in this paper.In order to deal with long-range dependencies,the model architecture adopted in this paper draws on the encoder block of Transformer model and the loss function in WGAN is used to replace the loss function of DCGAN.The experimental results show that the model collapse problem can be reduced to a certain extent by using WGAN;the multi-head self attention mechanism can help the model deal with both global and local features;and the use of layer normalization can avoid the model being disturbed by other samples in a batch of data.Compared with the traditional DCGAN method,the images generated by this method has diversity and it further improves the Inception Score index,reaching 1.72.
Keywords/Search Tags:image caption, image generation, object detector, fully convolutional operation, Transformer-encoder
PDF Full Text Request
Related items