Font Size: a A A

Research On Human Action Recognition Based On Deep Learning

Posted on:2020-04-24Degree:MasterType:Thesis
Country:ChinaCandidate:Z L ZhouFull Text:PDF
GTID:2428330578483375Subject:Control engineering
Abstract/Summary:PDF Full Text Request
At present,high-definition video surveillance has been widely used in public security agencies' security systems,military fields,and isolated elderly monitoring systems.For the above-mentioned high-definition video surveillance,the main monitoring object is people as an important monitoring target,then the detection of human body and the detection of human behavior are important means to understand the monitoring of video information,so it has great economic value and practical value for human behavior detection.Let it become a hot spot in the field of intelligent video surveillance research.In recent years,with the development of deep learning,the improvement of computer GPU equipment performance and the excellent performance of GPU in graphics computing,the advantages of deep learning based human behavior detection are slowly highlighted.Compared with traditional methods,deep learning algorithms do not need Manually extract image features to reduce workload and improve detection robustness.Therefore,this paper uses the deep learning algorithm to detect human behavior.The main research contents of this paper include:Firstly,the motion region is extracted from the video image to reduce the image size,reduce the number of parameters trained in the network,and improve the training efficiency of the network.This paper studies the existing target extraction algorithm and combines the existing target extraction algorithm to do two points.Improve.(1)A moving target detection algorithm based on improved four-frame difference method and ViBe algorithm is proposed.The ViBe algorithm has the characteristics of small computation,fast extraction and good robustness,but the computational characteristics of the ViBe algorithm are prone to "ghosting".Therefore,relevant improvements have been made in the original algorithm.The use of the frame difference method without background modeling does not produce the characteristics of "ghosting",the foreground area detected by the improved four-frame difference algorithm and the ViBe algorithm detected.The foreground target area is matched with the former attraction point,and the number of the former scenic spots matched by the parameter N is introduced,and the ratio of the detected number of the former scenic spots and the total number of the former scenic spots is compared,and a threshold T is set to judge that the foreground area is “Ghost” is still a real prospect.The experimental results show that the improved algorithm can quickly eliminate ghosts and verify the effectiveness of the proposed algorithm.(2)The improved four-frame difference algorithm uses the maximum inter-class difference method(Otsu)to automatically extract the threshold value to obtain the frame difference binary image,and adds an OR operation to the binary image based on the calculation.More motion information,combined with Canny edge detection algorithm to make up the contour information of the moving target.Secondly,this paper constructs a 3D spatiotemporal convolutional network(3DCNN)to train and train the segmented pictures.Changing the number of iterations of training and the number of training samples per batch makes the network training better.The experimental results show that the 3D convolution network can also obtain a good recognition rate for the recognition of human behavior of continuous video frames after processing.Finally,the use of Mict-Net(Mixed Convolutional Tube)network adopts the feature of 3D/2D hybrid connection,intercepts part of the network as the input end of ResNet network,and builds a dual-stream network based on this,and then extracts the dual-stream network.The spatio-temporal features are merged by convolution operation to obtain the fusion features including spatio-temporal information.Finally,the final feature extraction is performed on the fusion features through the 3D convolution network.Train the designed network,put the time network and space network on the ImageNet data set for pre-training,and then use the data set of this article to train the network.The experimental results show that the network designed in this paper is effective in human behavior detection.
Keywords/Search Tags:Four-frame difference, Vibe algorithm, Deep learning, action recognition, Spatio-temporal feature fusion
PDF Full Text Request
Related items