Font Size: a A A

Research And Application Of Image Captioning Based On Deep Neural Network

Posted on:2022-03-14Degree:MasterType:Thesis
Country:ChinaCandidate:C Y WangFull Text:PDF
GTID:2518306350494734Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Image captioning is to imitate the thinking mode of human beings,analyze the characteristic information of input image,and generate a text sequence describing the content of input image.At present,image captioning algorithms emerge in endlessly and have achieved good prediction results,but there are still some problems,such as the prediction results of the model do not conform to the real situation,the structure of the model is too complex,which is not convenient for practical application.Therefore,two image description models are proposed in this paper.The first is the image captioning model based on BDR-GRU(Img-bdr GRU);the second is the coder decoder network model based on visual guidance(VG-ED).The first image captioning model Img-bdr GRU mainly uses the idea of depth residual,and designs a new residual GRU model.In order to improve the information content of the text,a bidirectional recurrent neural network model is used,and then a BDR-GRU network model is constructed by combining the bidirectional and the depth residual.Finally,the final image captioning model img BDR-GRU is formed by combining the model with convolution neural network.The second image captioning model VG-ED mainly uses the idea of attention mechanism,and designs a new F-LSTM model integrating image information and text information.Although the previous attention mechanism has good prediction quality,it brings high computational cost,which is not conducive to the practical application of network model.Therefore,this paper improves the attention mechanism by using the global image feature vector information output by convolution part,and uses the adaptive attention mechanism to judge the use of image information according to the actual situation at each moment.By combining the newly designed model with convolutional neural network,the final image captioning model VG-ED is constructed.The experimental results show that the prediction quality of the two image captioning models is improved,and the CIDEr index of the second image captioning model is higher,which indicates that the model is closer to the real image description.In order to establish an intelligent real-time monitoring system which meets the actual needs,this paper proposes a solution based on image captioning algorithm,Firstly,the function of virtualizing GPU resources is realized,which enables multiple image captioning algorithms to use GPU resources at the same time.Then,the real-time video streaming server is used to transmit the real-time video frame data to the image captioning algorithm server.Through the algorithm server,the data is analyzed and a piece of text describing the scene is generated.Finally,the text is processed by voice broadcast system to generate audio and broadcast.
Keywords/Search Tags:Image Captioning, Deep Neural Network, Real-time Video Broadcast System, Residual Network, Attention Mechanism
PDF Full Text Request
Related items