With the continuous advancement of urbanization in our country,the density of urban population is increasing,and the composition of personnel is becoming more and more complex.In order to better ensure the safety of people’s lives and property,a large number of surveillance cameras are installed in various occasions.These cameras can capture a large amount of surveillance video data.However,the surveillance camera is only a video capture device,which can not identify and analyze the abnormal behavior of the human body in the video.If these videos are identified and analyzed manually,it will require a lot of human and material resources,which is not realistic.In recent years,with the development of deep learning technology,in order to realize the automatic recognition of human abnormal behavior in surveillance video,researchers have developed many human abnormal behavior recognition algorithms based on deep learning method.However,due to the influence of surveillance video angle,human occlusion and other factors,the recognition accuracy of these algorithms is not high enough,which affects the actual use.In this thesis,human abnormal behavior recognition based on 3D convolution is studied.The main work is as follows:(1)The overall network structure of this thesis uses the structure of slowfast network model.This model structure has two paths: slow and fast.The slow path captures spatial information,and the fast path captures human action changes.Finally,the features extracted from the two information are fused,and finally human behavior recognition is realized.In order to better train the network,this thesis constructs a small human abnormal behavior data set containing four common human abnormal behaviors.This data set is used to train the slowfast model whose backbone network is 3D-Res Net50,and the effects of the original residual unit structure and the pre training residual unit structure on the accuracy of human abnormal behavior recognition are compared.(2)In order to improve the accuracy of human abnormal behavior recognition,the3D-Res Net50 backbone network used in slowfast model is modified.If the input and output of the 3D-Res Net network are inconsistent during the shortcut connection,the dimensionality reduction and downsampling operation needs to be carried out.In this thesis,a max pooling layer is added during downsampling,which can make the downsampling have standards that can be based on,reduce the loss of useful information and the introduction of noise,and help the model to classify.At the same time,the information flow of 3D-Res Net network is improved,the arrangement of network layer in the four main stages of 3D-Res Net network is modified,and the residual units in each main stage are divided into three types: start residual unit,intermediate residual unit and end residual unit.This change is more conducive to improve the learning ability of the model without increasing the complexity of the network model.At the same time,h-swish activation function is used to replace the activation function in 3D-Res Net50 network.After replacement,the recognition accuracy of the model has been improved to a certain extent. |