| With the development of city construction,increasing population density,increasing pressure on city management,people’s demand for security monitoring field is increasing,computer automatic recognition and screening of violent acts in video is an effective means to protect the safety of people’s lives and property.Due to the excellent performance of deep learning in the field of image,video action recognition continues to make new breakthroughs This thesis mainly aims to improve the two-stream convolutional neural network in the field of deep learning action recognition.The residual network is used to replace the original basic feature extraction network,combined with the convolutional block attention module to improve the extraction of key feature information of the image,and finally the bidirectional gated recurrent unit is used to model the long-term video sequence information,which effectively improves the accuracy of action recognition The main work and innovation points of this thesis are as follows:(1)Aiming at the problem that the basic feature extraction network layer is too shallow in the original two-stream neural network architecture and the feature extraction from the image is not sufficient,a new method is proposed to replace the original branch network with different deep neural network.The experimental results show that the combination of the deep residual network can effectively enhance the feature extraction capability of the two-stream convolutional neural network.Finally,ResNet50 was adopted as the basic network structure for subsequent improvement.(2)Aiming at the problem that existing deep neural networks all adopt the same convolution calculation method for the features of image,so they cannot fully learn the key features of current actional recognition,a new method is proposed to add convolutional block attention module into the branch network structure of two-stream neural network to enhance the learning ability of key features.The effectiveness of the convolutional block attention module is verified by comparing the recognition effects of attention modules in different insertion positions of the network structure.(3)Aiming at the problem that the ability of two-stream neural network to model long time video action information is insufficient,a method of video sequence modeling by adding recurrent neural network elements into branch network is proposed.The input of the network adopts global segmented sparse sampling image sequence,and the recognition results of gated recurrent unit and bi-directional gated recurrent unit are compared.The results show that the addition of bi-directional gated recurrent unit can effectively model the time series information before and after the addition of bidirectional gated recurrent unit,and effectively improve the accuracy of action recognition.(4)Aiming at the problem that there is no open data set supporting model training in the field of ATM violent action recognition and combined with the actual application needs of ATM violent action recognition.In this thesis,an ATM scene action data set was produced and divided according to the ratio of 3:1 for the training test of the model.Based on the model trained by the improved two-stream neural network in the self-built data set,an ATM security monitoring system is designed and implemented.The system can accurately judge the action categories in the video clips,and visualize the recognition results.At the same time,it provides the function of identification record storage and query,so as to improve the efficiency of followup surveillance video review.Finally,the system is tested on different video clips to verify the effectiveness of the algorithm. |