In recent years,with the development of intelligent video surveillance,human-computer interaction,video retrieval and so on,human action recognition in video has become the focus of research in the field of computer vision.On the one hand,the most primitive human action recognition relies on 24-hour human eyes to judge the categories of video,which is inefficient and extremely consumes human and financial resources.In addition,when the category labels of video behavior are increasingly large,it is impractical to rely on people to remember tens of thousands of category labels.On the other hand,even for the same human action behavior,it is often difficult to ensure that each classification is accurate,because there is a certain subjectivity and randomness in the process of classification.Therefore,there is an urgent need for a action recognition algorithm with high recognition accuracy and automatic action feature extraction.The emergence of deep learning just perfectly solves this problem.In order to extract features efficiently and improve the accuracy of action recognition,this paper proposes a action recognition algorithm based on neural network,and finally completes the human action recognition in the video.Based on UCF101 and HMDB51,this paper studies the action recognition algorithm in deep learning method.Firstly,it introduces the basic theory of deep learning in detail,and then describes the origin and progress of the double flow convolution neural network.By simulating the ventral channel and the dorsal channel of the visual cortex processing mechanism,it divides the network into two parts: spatial flow and time flow.Secondly,for the problem that the two stream convolution network can not deal with the long-time video modeling well,the classical temporal segment network is introduced.This paper studies and considers the classical model,and proposes a temporal segment network action recognition based on feature propagation,which mainly includes the following improvements:1.The network model with stronger feature extraction ability is introduced,and Inception V3 structure is taken as the basic skeleton of convolution network of space flow and time flow respectively.2.In the original algorithm of temporal segment network,the optical flow cannot be extracted end-to-end,which needs to be extracted offline and at a low speed.The introduction of Flownet C network to extract optical flow information further reduces the error rate of optical flow extraction.3.In this paper,the method of feature propagation is proposed,which is applied to the temporal segment network,and a new P-TSN network structure is proposed.The relevant experiments are carried out on the above two data sets.The key frames are extracted by using the spatial flow,and the non key frames are extracted by using the feature propagation.Then the spatial flow and the time flow are weighted and fused into softmax for classification,and finally the number of the key frames is calculated.The experimental results on the dataset prove the effectiveness of the method. |