Font Size: a A A

Human Action Recognition Method Based On DenseNet And Multi-Scale Temporal Information

Posted on:2021-10-10Degree:MasterType:Thesis
Country:ChinaCandidate:L JinFull Text:PDF
GTID:2568306632966899Subject:Control engineering
Abstract/Summary:PDF Full Text Request
In the recent years,with the rapid development of computer vision technology,a series of technology such as object detection,position estimation,face recognition and action recognition have made great progress,and they were gradually applied to our life,which made our life more convenient.Among them,action recognition especially has a wide range of application values,it plays an important role in intelligent monitoring,human-computer interaction,video retrieval,automatic identification alarm,public safety and many other fields.Due to the complexity of human behavior in video,as well as a series of problems such as Interference from external background and camera shake,it become a great challenge to exploring a way to improve the accuracy of human action recognition in video.This paper makes an in-depth research on algorithms of human action recognition,the main work is as follows:First,we explored the application of 2D and 3D convolution in human action recognition respectively,including two-stream convolutional networks and C3D convolutional network.We verified the performance of these two networks on the UCF101 dataset.Then,for the calculation and storage of the optical flow characteristic consume too much resource,and the number of C3D convolution network layers is too small,we combine 3D convolution with 2D convolution to form a hybrid convolution network to improve the performance of the C3D convolutional network.Then,in order to get the deep features of the video,we refer to the DenseNet network structure,build dense connection blocks to establish a dense connection between layers,which achieve feature reuse and improve the efficiency of feature extraction.At the same time,the number of layers of the network is deepened,The nonlinear transformation in densely connected blocks adopts a hybrid convolution method,which improves the 3D convolutional layer’s capability to extract time information.Finally,considering that the motion of the characters in the video is not evenly distributed throughout the video,we refer to the Inception network structure,add 3D convolution kernels with different time depths in the transition layer to perform convolution operations in parallel.This design simulates a 3D convolutional layer with variable time depth,which can modeling the sequence video frames in the short,medium and long time,this ensures the network can capture important temporal information that is not captured at a stable time depth.Then transform transition layer will be named as multi-scale temporal transition layer,after replacing the multi-scale temporal transition layer with the original deep mixed convolution network based on DenseNet structure extension,the depth is not increased,the width is increased,and the recognition accuracy is improved significantly.After comparing with the current human action recognition methods,it is concluded that the solution proposed in this paper works best.
Keywords/Search Tags:Human action recognition, 3D convolutional neural network, DenseNet network, Multi-Scale temporal transition layer
PDF Full Text Request
Related items