Font Size: a A A

Research On Weakly Supervised Temporal Action Detection Algorithm

Posted on:2022-09-19Degree:MasterType:Thesis
Country:ChinaCandidate:S J DuanFull Text:PDF
GTID:2518306512971879Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
In recent years,with the explosive growth of the number of videos,related algorithms for video content understanding have been exten sively studied.The current video content understanding mainly includes tasks such as action recognition,temporal action detection and video caption generation.Among them,temporal action detection refers to locating the start and end tim e of the action segment in the untrimmed video and judging the action category.Fully supervised temporal action detection algorithm requires detailed annotation of video data,and the extremely high cost of labeling limits the application of temporal action detection to actual scenes.The weakly supervised temporal action detection algorithm only needs video-level category labels,the labeling cost is low and it is very easy to obtain,which has great practical significance.Therefore,this paper studies the weakly supervised temporal action detection algorithm,and hopes to improve the performance of the weakly supervised temporal action detection model through the algorithm proposed in this paper.The duration of different action segments in the untrimmed video is very large,and the weakly supervised method does not have prior information with precise time annotation,which makes it difficult to detect the action segments completely and accurately.In response to this problem,this paper proposes a temporal feature fusion module to extract contextual inform ation between feature segments,and fuse features with different temporal information to improve the completeness of action segment detection.At the same time,in order to fully explore the complementarity between RGB features and Flow features and improve the accuracy of temporal boundary positioning,this paper proposes a two stream feature selection module,which better combines the advantages of two stream features,thereby improving the performance of weakly supervised temporal action detection model.In the untrimmed video,there are not only action segments,but also a large number of background frames.However,the weakly supervision method only relies on video-level category labels in the training phase in the training stage and ignores the background information,which makes it easy for the model to misdetect background frames as action segments in the process of action detection.To solve this problem,this paper proposes a multi-branch background suppression network.The network mainly includes a multi-branch basic module and a multi-branch suppression module.By using different supervision signals,the multi-branch suppression module can suppress the activation value of the background frame in the temporal activation sequence,thereby reducing the false detection in the action segment detection and improving the accuracy of the weakly supervised temporal action detection model.Finally,this paper verifies the effectiveness of the algorithm proposed in this paper by conducting comparative experiments and decomposition experiments on two general data sets of THUMOS14 and ActivityNet1.3.
Keywords/Search Tags:Weakly supervised, Temporal action detection, Two-stream features, Background suppression
PDF Full Text Request
Related items