Temporal Action Detection And Video Caption Algorithm Based On Deep Learning

Posted on:2020-07-28

Degree:Master

Type:Thesis

Country:China

Candidate:X N Liu

Full Text:PDF

GTID:2428330575456375

Subject:Information and Communication Engineering

Abstract/Summary:

PDF Full Text Request

With the rapid development and popularization of large-capacity memory,multimedia technology,digital devices,computer networks and communication technologies,the video data in the network has exploded.How to analyze the large and unorganized video data faster and better has become a research hotspot of the computer vision.Due to the limitations of traditional video analysis methods and the advantages of deep learning techniques in extracting high-level semantic information of images,video comprehension based on deep learning has become the mainstream solution for video intelligence analysis.At present,the research on video comprehension includes video action recognition,temporal action detection,object tracking,video summary and caption.This paper focuses on two issues.One is how to effectively detect actions in the untrimmed video from real scenes,and the other is how to establish the connection between visual infomation and natural language.The temporal action detection and the video caption algorithms currently have the following challenges:1)fixed feature maps make the temporal action detection have a low recall of the varied-duration actions;2)dense video caption based on temporal action proposal is divided into two stages,which destroys the interaction between the two tasks.To solve these challenges,this paper proposes a multi-scale temporal action detection algorithm based on feature pyramid network(FPN-TAD)and a joint optimization dense event caption algorithm based on the descriptive regression(DR-DVC).The FPN-TAD algorithm detects the objective action area on the multi-scale feature maps by introducing the FPN structure,which effectively improves the recall of the varied-duration actions.The DR-DVC introduces a descriptive loss in the event proposal,which encourages the proposals to contain more information relating to descriptions.Considering the different contribution of different video frames to the description results,the descriptive scores also can be used as the attention mechanism weights of the event description to improve the description accuracy.Finally,the paper validates the effectiveness of the FPN-TAD and DR-DVC on the ActivityNet and others.By comparing the proposed algorithm with the baseline and the mainstream algorithms,the results show that the proposed algorithm has obvious performance improvement compared with the baseline,and it is better than most of the mainstream algorithms,which proves the feasibility and effectiveness of the algorithm.

Keywords/Search Tags:

deep learning, temporal action detection, video caption, multiscale feature, joint optimization

PDF Full Text Request

Related items

1	Video Action Detection Based On Deep Learning
2	Research On Temporal Action Location Method Combining Light And Heavy Networks In Untrimmed Video
3	Research On Video Temporal Action Localization And Caption Methods
4	Temporal Convolutional Network Based Temporal Action Detection
5	Research On Human Action Analysis And Recognition Method Based On Deep Learning
6	Temporal Action Detection Based On Deep Learning
7	Research On Temporal Action Detection Algorithm Based On Boundary Matchin
8	Research On Temporal Action Detection Based On Accurate Boundary Prediction
9	Design And Implementation Of Context Cascade Network For Video Temporal Action Detection
10	Research And Implementation Of Video Action Detection Task Based On Deep Learning