Font Size: a A A

Research On The Method Of Human Action Detection In Videos

Posted on:2020-02-16Degree:MasterType:Thesis
Country:ChinaCandidate:L ZhaoFull Text:PDF
GTID:2428330596976176Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Human action detection in an untrimmed video is a very important yet challenging problem in the field of computer vision and pattern recognition,and it is significant in various applications such as video search and intelligent video surveillance.Human action detection aims at not only locating start-point and end-point for each action instance in an untrimmed video but also recognizing the action category.The research in the field of human action detection has achieved some success with the utilization of deep learning technology,but it is still unrealistic for practical applications.In this thesis,the study is conducted in two main aspects: the traditional algorithm and the deep learning algorithm.The main content is as follow:1.Enhanced the performance of dense trajectories algorithm based traditional human action detection method by adding spatial features.Dense trajectories algorithm can efficiently extract motion features in videos,meanwhile,it is the best hand-crafted feature representation in the field of video analysis.Features are extracted on action proposals which are generated as candidates by temporal dense sliding windows,and then use the fisher vector encoding method to remap the feature vectors.Furthermore,the spatial information features are generated with a convolutional neural network,additionally,the fisher vector encoded dense trajectories features and spatial information features are concatenated as a new video feature representation to improve the performance of action detector.2.Dilated convolution and attention mechanism embedded residual block is proposed to produce better video proposals.The key feature of the residual network is the skipping connection,and the residual network has obtained tremendous success in the field of image recognition due to the ability to accelerate convergence and boosting performance.In this thesis,the widely used residual block is modified by adding dilated convolution and attention mechanism to extend the receptive field and generate better feature maps respectively.Furthermore,a novel start-end-points generating network by employing the modified residual blocks is proposed to obtain more accurate start and end points in untrimmed videos,and generate better video proposals.3.Utilization of focal loss as the loss function to alleviate the extreme imbalance of positive-negative class.At the training procedure of the start-end-points generating network,the distribution of positive-negative class is extremely imbalanced which makes the training procedure is dominated by easily classified negative samples.The utilization of focal loss can address this problem,and obtain better classification results.4.Background elimination network for removing the disturbance of background in the final classification procedure is utilized to improve detection performance.This is inevitable to contain background parts in proposal candidates,and put those proposal candidates directly into classifier may result in incorrect results.The background elimination network is proposed in this thesis to generate a foreground-background mask vector to alleviate the influence of background,therefore,the classifier can generate more reliable results,with the result that the classifier only relies on the foreground parts in candidates.
Keywords/Search Tags:human action detection, dense trajectory, dilated convolution, attention mechanism, background elimination network
PDF Full Text Request
Related items