Dense Video Captioning Based On Part-of-speech Tagging And Attention

Posted on:2021-10-27

Degree:Master

Type:Thesis

Country:China

Candidate:Z J Zhu

Full Text:PDF

GTID:2518306107453374

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

In recent years,with the gradual popularization of high-definition video surveillance and the rapid development of short-video social software and live broadcast software,video data has exploded.How to analyze these massive video data to obtain key feature information has gradually become a research focus in the direction of visual intelligence analysis.For example,relevant government departments can analyze the video surveillance video to obtain the behavioral characteristics of the characters;video reviewers can quickly review the video content through the video description.Therefore,research on this issue is of great significance to the development of intelligent video analysis.Dense video captioning refers to finding the sequential actions contained in the input video,including the start and end moments of the actions,and describing these sequential actions in natural language.This topic mainly studies two aspects,one is the time-series action generation,that is,to accurately obtain the start and end time of the action contained in the video.The second is video description,which describes the timing actions in the video.The current sequential action generation algorithm only considers the characteristics of the unidirectional propagation of the video,and fails to effectively combine the reverse characteristics of the video,resulting in a low recall rate of the generated sequential actions.At the same time,the video description algorithm fails to fully integrate the video features and time-series action features to generate a dynamic video feature,and ignores the part-ofspeech tagging time-series feature information of words,which makes the generated natural sentences less accurate.In order to solve the above challenges,this paper proposes a dense video caption algorithm(PosA?DVC)based on part-of-speech tagging and attention mechanism.Among them,for the generation of sequential actions,a bidirectional single-stream TemporalAction Proposals Based onAttention(BiA?SST)algorithm based on the attention mechanism is proposed,and the forward characteristics and reverse directions of the sequential actions are obtained through two sequential network models Feature,and use the attention mechanism to combine these two features,and ultimately improve the recall rate of sequential actions.For video description generation,this paper uses attention mechanism to fuse video features and motion features to obtain dynamic video features,and combine word annotation information to generate word annotation timing features,and finally combine word annotation timing features,dynamic video features,and word features to dynamically generate Corresponding natural sentence description,in order to improve the description accuracy.In this paper,the experiments of BiA?SST and PosA?DVC algorithms are carried out on THUMOS-14 andActivity Net Caption video data sets,respectively,and the experimental results are analyzed,and finally compared with related algorithms,thus reflecting the feasibility of BiA?SST and PosA?DVC algorithms.

Keywords/Search Tags:

Dense video caption, Sequential action proposal generation, Video description, POS, Attention

PDF Full Text Request

Related items

1	Research On Video Caption Generation Depth Model Based On Video Temporal Attention Level Fusion Mechanism
2	Video Action Detection Based On Deep Learning
3	Deep Learning Based Human Action Recognition And Video Description Generation
4	Research On Temporal Modeling Method For Video Understanding
5	Research On Video Caption Algorithm Based On Encoder-Decoder Model
6	Research On Key Algorithms Of 2D Image Textual Description Generation Based On Visual Features
7	Temporal Convolutional Network Based Temporal Action Detection
8	Research On Temporal Action Detection Based On Accurate Boundary Prediction
9	Temporal Action Detection And Video Caption Algorithm Based On Deep Learning
10	Research On Video Caption Based On Deep Learning Sequence Model