Font Size: a A A

Research On Video Action Detection Based On Sensitive Feature Selection And Action Region Enhancement

Posted on:2020-07-02Degree:MasterType:Thesis
Country:ChinaCandidate:Y YangFull Text:PDF
GTID:2428330599958956Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
Video action detection,which includes action classification and localization,is a fundamental task in computer vision.To be specific,the algorithm needs to find the start and end time of each action instance in video,and meanwhile,assign category label for them,severally.Video action detection plays a key role in many practical applications,e.g.intelligent monitoring,video retrieval,somatosensory games,medical health and intelligent device control.Although tremendous progress has been made on video classification for short trimmed video thanks to the success of deep learning,video action detection remains a much more challenging problem since most videos in realistic life are long and untrimmed.Recently state-of-the-art methods focus on generating more accurate action proposals and training to get better classifiers and regenerators.To make full use of the main action area in video and deal with inherent discrepancy between action classification and localization,we propose a multi-task structure framework,which learns to enhance action region adaptively and select sensitive features autonomously.In summary,the main contributions of this paper are as follows:(1)Based on the action region selection,an adaptive region enhancement method is proposed.Core of this method is to let the network pay attention to the action area of the video and enhances the contribution of the action area to the video detection task while suppressing the influence of the related action area on the detection task.Concretely,the network can learn to focus on the subject area in the video automatically via well-designed adversarial training strategy and loss function.Furthermore,we introduces a mask mechanism to explicitly guide the network to improve the contribution of the main action area for better recognition.(2)A sensitive feature selection method is proposed.Our motivation is that the selection of key frame is crucial for action classification and the relation among frames is essential for action localization.To handle with the internal difference between them,we proposes a sensitive feature selection method.It consists of two sub-modules,one for choosing key frame and one for learning the relation among frames.Specifically,the former scores the importance of each frame in the video,and the latter models the correlation between each pair of frames.The experimental results show that the proposed module highly meets the demands of video action detection in realistic life.Based on the above designs,the ultimate model proposed in this paper achieves 38.97% in terms of mAP@0.5 on the THUMOS14 dataset,which outperfoms the basic model(SSN+BSN)by more than 2% and surpasses the baseline(SSN)method by about 5%.In addition,we observe consistent performance gain on various basic networks equipped with our proposed module.
Keywords/Search Tags:Video action detection, A daptive area enhancement, Sensitive feature selection, Key frame, Graph convolution
PDF Full Text Request
Related items