Font Size: a A A

Research On Temporal Action Location Method Combining Light And Heavy Networks In Untrimmed Video

Posted on:2021-03-15Degree:MasterType:Thesis
Country:ChinaCandidate:W WangFull Text:PDF
GTID:2428330626462853Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Temporal action localization is the premise of action recognition,inaccurate action positioning results will further affect the accuracy of action recognition,so it is necessary to study efficient and accurate temporal action positioning methods.In practice,the video is often undivided,which means that the pre-segmented proposal is not given in advance.How to quickly locate the action proposals of different types and different lengths in the undivided video with complex and variable content has important research and practical significance.Existing temporal action localization usually consists of candidate proposal segment extraction,proposal feature extraction and action boundary regression.The problem with the strategy of generating candidate proposals from the sliding window is that the number of candidate proposals is huge,which not only puts a heavy load on the subsequent feature extraction and regression tasks,but also generates a large number of proposals and the real action segments are less related.In addition,the boundary regression network is a unit-level positioning method with a large granularity,which is not ideal for large-scale video clips.Therefore,in order to solve the above problems,this paper proposes a sequential action localization method that combines light and heavy networks.Firstly,the rough detection of the action proposed is carried out through the lightweight network,then the fine-grained action positioning of the proposal detection result is carried out through the heavyweight network,and finally the dense prediction information is processed after NMS merging.The specific method is as follows:in the proposal detection module,the video clips of different scales generated by the sliding window are taken as input and sent to the proposal recognition network based on ResNet-10 for binary classification(0 is the proposal and 1 is the non-proposal,ie background),the advantage of using ResNet-10 in the backbone network is that the lightweight network can quickly identify proposed segments of video clips.In response to the problem of inaccurate motion position detection of the proposal,this paper designs the ResNet-50-based time-series motion regression network to perform frame-level motion confidence judgment and position offset prediction.The advantage of this is that on the one hand,the use of heavy weight can not only extract more distinguishing depth features of video clips,on the other hand,it can generate dense category and position prediction information,which is convenient for facilitate better positioning of small and large scale action proposals.In order to verify the performance of the mAP and AR-AN of the undivided video action positioning method proposed by the light-heavy network proposed in this paper,an experimental test was carried out on the THUMOS-14 data set.The experimental results show that,compared with the TURN method,the method in this paper is 8.49%higher than the mAP@0.5 of the TURN(Temporal Unit Regression Network)method proposed by Gao J et al.in 2017 IEEE and 1.11%higher than the mAP@0.5 of the BSN(Boundary Sensitive Network)method proposed by Lin T et al in 2018 ECCV;Based on ResNet-10's temporal action proposal segment detection method and TURN method,the number of candidate proposals generated is about 66,000 and 408,000(approximately 6 times the difference).However,the method in this paper is 7.12%higher than the TURN method in AR-AN@100;In summary,the experimental results show that,compared with the existing method,the method in this paper completes the task of undivided video temporal action detection quickly and with high precision.For practical application,this paper applies the research results to the remote rescue of the empty nest elderly.To address the social problem that the empty nest elderly can't be treated in time when dangerous actions such as falls occur,an empty nest based on sequential motion positioning is developed.The remote rescue system for the elderly realizes real-time fall detection and automatic early warning for the empty nest elderly under monitoring conditions,so that the elderly can get timely assistance from the outside world.
Keywords/Search Tags:video analysis, deep learning, temporal action detection, temporal action regression, action recognition, fine-grained temporal representations
PDF Full Text Request
Related items