Font Size: a A A

One-Stage Temporal Action Detection For Open-Set Scenario

Posted on:2024-09-18Degree:MasterType:Thesis
Country:ChinaCandidate:J S HuFull Text:PDF
GTID:2568306932955819Subject:Information and Communication Engineering
Abstract/Summary:
Temporal signal event detection aims to detect the start and end times of the interested events from continuous temporal signals,while identifying the category of the events.It has a wide range of applications in daily life and military fields.This dissertation takes temporal action detection as a practical task and studies how to detect human action events that occur in untrimmed video signals,while simultaneously recognizing action categories and detecting start and end times.Although some progress has been made in temporal action detection,existing methods still suffer from issues such as inaccurate action boundary detection and high false positive rate in practical applications.These problems seriously hinder the practical application of temporal action detection technology.In response to the above challenges,this dissertation studies the one-stage temporal action detection method under open set scenes,the main research contents include:1)To address the issue of inaccurate boundary detection of one-stage temporal action detection methods,this dissertation proposes a boundary and region based action proposal confidence evaluation algorithm,which utilizes both global and local information to measure the confidence of the action proposal.Through the joint utilization of global and local information,the algorithm achieves enhanced accuracy in confidence evaluation,and as a result,increases the accuracy of boundary detection.Experimental results indicate the method achieves the best performance among two commonly used benchmark datasets.2)To tackle the issue of a high false positive rate in action detection,this dissertation proposes a method for temporal action representation based on the minimization of sharpness.To capture long-term action dependencies while avoiding the interference of background information,the self-attention mechanism of Transformer is used to extract human motion-related details,resulting in high-quality action representation.Additionally,the network is trained using the sharpness minimization algorithm,leading to higher generalization capability.Experiments on multiple datasets show that our method outperforms the current state-of-the-art open-set models.
Keywords/Search Tags:Video Understanding, Open-Set Temporal Action Detection, Temporal Action Localization, Generalization
Related items