Generating Descriptions For Videos With Attention Model

Posted on:2019-12-04

Degree:Master

Type:Thesis

Country:China

Candidate:Z W Yang

Full Text:PDF

GTID:2428330593451030

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Exploring semantic correlation between visual content and natural language is a crucial challenge on multimedia content analysis and computer vision.Recently,the development of deep learning provides a strong technical support for the breakthrough of this problem.As consecutive visual content,videos deliver much information.Temporal and spatial structures of videos are the key points for video content understanding.Current video captioning methods based on deep learning design different deep network to encode video frames and incorporate their temporal-spatial structures.Differing from previous methods,this paper focuses on the applications of attention model on video captioning task.In this paper,two video captioning methods will be introduced,which attempt to automatically focus on some important video frame regions or video segments while generating video descriptions.The first method proposed in this paper considers salient video segments.It introduces the attention model into language model on video captioning.The method will adaptively select salient video segments for word prediction.It is evaluated on the popular benchmark MSVD.The experiments demonstrate that introducing temporal attention on sentence generating stage improves the performance of video captioning method.The second method considers regions of interest on single video frame and temporal dependencies between these regions.It utilizes attention model with the guidance of global feature to select regions of interest for each video frame.Besides,the paper designs a dual memory recurrent model to incorporate temporal dependencies of global features and interested region features respectively.This outputs a more discriminative video representation.The second method is also evaluated on MSVD and M-VAD dataset and gets great performance.

Keywords/Search Tags:

Video Captioning, Deep Learning, Attention Model, RNN

PDF Full Text Request

Related items

1	Research On Video Captioning Based On Deep Learning
2	Spatio-temporal Attention Model For Video Captioning
3	Video Summarization And Captioning Via Spatio-temporal Information And Deep Learning
4	Research And Application Of Video Captioning Technology Based On Deep Learning
5	Research On Visual Captioning Based On Deep Learning
6	Generating Descriptions For Videos With Attention Model
7	Research On Academic Figure Captioning Based On Deep Learning
8	Image Chinese Captioning Model Based On Deep Learning
9	Research On Video Captioning Based On Deliberation Mechanism
10	Research On Image Captioning Algorithms Based On Deep Learning