| Video behavior recognition technology based on deep learning is a research work that many researchers are enthusiasm at home and abroad.It is widely used in production and life,and plays an increasingly important role in the fields of intelligent driving,automatic video classification and advanced human-computer interaction.The attention mechanism is introduced into the deep neural network to study the human behavior recognition in the videos.Relevant experiments are carried out on several datasets,including UCF101,HMDB51 and UCF24.The research content of the paper can be summarized as the following two aspects.(1)The existing methods of video behavior recognition have the problem that the interactional relationship among features is ignored in the process of feature extraction,which makes the effect of distinguishing approximate actions poor.Therefore,a human behavior recognition method with high-order attention mechanism is proposed.A high-order attention module is introduced to a deep convolutional neural network,which models and utilizes the complex and high-order statistics information in attention mechanism.The goal of attention is to reallocate the weight of each part of the feature map in the process of training,so as to focus on the local fine-grained information,produce the discriminative attention proposals,and capture the subtle differences among behaviors.At the same time,the influence of two different model fusion strategies of weighting and averaging on the recognition performance based on two-stream convolutional network is explored.Experimental results show that the high-order attention mechanism can improve the distinguishing effect of fine-grained actions and improve the accuracy of action recognition.(2)In the deep neural network,the features extracted by the shallow and deep level networks are different.The previous research methods only focused on the deep features and ignored the role of the shallow features on the recognition results.Therefore,a human behavior recognition method based on feature fusion is proposed.In other words,the features extracted by the deep and shallow level networks are combined.Shallow and deep level feature extraction networks are designed.The shallow level network captures low-level edge contour and texture information,the deep level network extracts high-level semantic information of behavior,high-order attention mechanism is introduced to model the interactional relationship among different parts of the feature map,and the features extracted from the shallow and deep level networks are added and spliced.The fusion operation can complement and synthesize the features extracted from the deep and shallow layers to obtain a more discriminative feature expression.The experimental results show that the method of feature fusion of deep and shallow layers can enhance the expression ability of features,and the additive feature fusion strategy has better recognition performance. |