Font Size: a A A

Research And Implementation Of Video Semantic Description Based On Deep Learning

Posted on:2019-08-11Degree:MasterType:Thesis
Country:ChinaCandidate:Y J MiaoFull Text:PDF
GTID:2428330566996066Subject:Information networks
Abstract/Summary:PDF Full Text Request
The purpose of the video semantic description is to generate a natural language description of the video.The problem of video semantic description has been studied for many years.With the success of it in the field of image recognition,CNN provides a great help for the progress of video classification.As a video can be viewed as a temporal content,the LSTM network can provide it with long and short term clues.In this paper,CNN and LSTM networks are merged.CNN is used to extract the video features.Then the video features are input into the first unit of LSTM decoder.Subsequent words are generated from the previously predicted words and the description language is finally generated.In order to obtain more accurate video features that represent video contents better,a wrong temporal-order frame identification model based on an end-to-end multi-branch CNN is proposed,which can identify a sequence of frames which has a wrong temporal-order among several frame sequences.3D convolutional encoding is used to encode the frame sequences of all the input,so the model can extract their temporal and spatial features,and then carry on the contrast analysis,by finding common rules between them,so as to output the wrong temporal-order frames that violate this rule.The Encoder-Decoder framework is adopted to complete the video semantic description system.The encoder uses 3D CNN(C3D)to extract the video features.After obtaining the video features,the feature vectors are input to the LSTM of the decoder to generate the predicted words.Finally,the final description language is obtained through the text summarization method.The proposed methods relies on large-scale datasets.Therefore,the UCF-101 and Youtube2 Text datasets are selected to train and test the model.The experiments results show that the proposed methods are effective.In the end,the video semantic description system based on deep learning is completed and carries on the function test.Through the analysis of test results,the system meets the actual needs.
Keywords/Search Tags:deep learning, video description, frame-sequence identification, two-layer LSTMs
PDF Full Text Request
Related items