Research And Implementation Of Video Semantic Description Based On Deep Learning

Posted on:2019-08-11

Degree:Master

Type:Thesis

Country:China

Candidate:Y J Miao

Full Text:PDF

GTID:2428330566996066

Subject:Information networks

Abstract/Summary:

The purpose of the video semantic description is to generate a natural language description of the video.The problem of video semantic description has been studied for many years.With the success of it in the field of image recognition,CNN provides a great help for the progress of video classification.As a video can be viewed as a temporal content,the LSTM network can provide it with long and short term clues.In this paper,CNN and LSTM networks are merged.CNN is used to extract the video features.Then the video features are input into the first unit of LSTM decoder.Subsequent words are generated from the previously predicted words and the description language is finally generated.In order to obtain more accurate video features that represent video contents better,a wrong temporal-order frame identification model based on an end-to-end multi-branch CNN is proposed,which can identify a sequence of frames which has a wrong temporal-order among several frame sequences.3D convolutional encoding is used to encode the frame sequences of all the input,so the model can extract their temporal and spatial features,and then carry on the contrast analysis,by finding common rules between them,so as to output the wrong temporal-order frames that violate this rule.The Encoder-Decoder framework is adopted to complete the video semantic description system.The encoder uses 3D CNN(C3D)to extract the video features.After obtaining the video features,the feature vectors are input to the LSTM of the decoder to generate the predicted words.Finally,the final description language is obtained through the text summarization method.The proposed methods relies on large-scale datasets.Therefore,the UCF-101 and Youtube2 Text datasets are selected to train and test the model.The experiments results show that the proposed methods are effective.In the end,the video semantic description system based on deep learning is completed and carries on the function test.Through the analysis of test results,the system meets the actual needs.

Keywords/Search Tags:

deep learning, video description, frame-sequence identification, two-layer LSTMs

Related items

1	Research On Video Frame Sequence Prediction Algorithm Based On Deep Learning
2	Multiple Description Coding Based On Deep Learning
3	Research On Image And Video Description Methods Based On Deep Learning
4	Deep Learning For Video Description
5	Research Of Person Re-identification In Video Based On Deep Learning
6	Research And Application Of Video Frame Interpolation Based On Deep Feature
7	Research On Automatic Video Description Based On Deep Reinforcement Learning
8	Research On Video Semantic Description Generation Based On Deep Learning
9	Research On Person Re-identification Of Multi-camera Video Targets Under Deep Learning Conditions
10	Visual Content Description Technology Research Based On Deep Learning