| In the era when the pace of life is accelerating in the development of information technology,the forms of information carriers are gradually diversified.In particular,the video information transmission mode has gradually replaced the traditional text information transmission mode and gradually become the mainstream information carrier with its advantages of richer content,more diverse forms,and faster information interpretation.However,in real-life videos such as surveillance videos,we-media videos often have a long sequence length and contain a large amount of information,which puts forward a new test in the field of video public security.Traditional video action detection models cannot effectively deal with long videos,so the research of temporal action detection algorithms for long video action location and classification is an urgent problem to be solved.Aiming at the problems of unclear semantic relations and ambiguous positioning of critical segments in traditional research on temporal action detection,the following two areas of research are performed in this paper:(1)Aiming at the problems of insufficient utilization of temporal semantics and unclear semantic relations,this paper proposes an Anchor-free and Long-term Attention Perception Based Video Interactive Action Detection Method(LTAP).In LTAP the multi-level pyramid module is built to capture video semantic information under different granularity,to highlight the different sequence lengths and different types of action.At the same time,the method of absolute position coding for multilevel video feature location tagging makes up for the inadequacy of convolution operation on the temporal semantic sequence.Thus,the problem of confusion in the discrimination of symmetrical actions such as "open door" and "close door" can be solved.The second innovation puts forward the dynamic contextual attention module.Different from the traditional attention mechanism,the dynamic contextual attention module is divided into two parts,the influence of the past on the future and the future on the past.These two parts of attention are integrated into the original feature and dynamically adjust the degree of impact on future events of the past events,and vice versa.The model achieves excellent results on the THUMOS14 data set.(2)Aiming at the problems of unclear key segments and blurred boundaries,this paper proposes an Anchor-free Video Critical Segment Activation Method for Temporal Action Detection(CSA-TAD).In CSA-TAD the Mixup data augmentation module is first constructed to improve the information expression ability by randomly shuffling and fuse the videos with short time sequence length,and the labels of the corresponding videos are adjusted accordingly to ensure the convergence direction of the model remained unchanged.Secondly,aiming at the problem of fuzzy action boundary localization in the traditional temporal action detection model,a boundary attention module is built to focus on the action boundary part in the video,highlight the boundary information and weaken other information.Finally,an action activation module is proposed,which uses contrast learning to highlight the action segment,weaken the background segment,disperse the overall training pressure of the model,improve the sensitivity of the model to the action segment,and optimize the performance of the model.Experimental results show that the proposed model performs well on both THUMOS14 and Activity-Net1.3 datasets. |