| In the current era of rapid development and application of artificial intelligence technology,intelligent systems are becoming ubiquitous.Human action recognition is an important research direction in the field of artificial intelligence.Traditional methods represent different human actions by designing feature descriptors and expression operators,which is complex and may lead to insufficient representation when characterizing deep features.Even in mainstream deep learning methods,there are still some challenges in expressing network features.To address these issues,this paper proposes a new three-stream network model for human action recognition and an improved version,based on the original dual-stream network for extracting temporal and spatial features to form a three-stream network that can extract deeper and richer characteristics of human actions.The main contributions of this research are as follows:1.A three-stream human action recognition model is proposed to address challenges such as high ambient light interference,strong occlusion,insufficient generalization ability,and weak robustness of the original dual-stream network.The proposed method adds a frame sequence stream feature extraction network,composed of a long and short-term memory network,to extract human action features based on the spatial and temporal information features extracted by two convolutional neural networks in the original dual-stream network.The proposed method uses Open CV to divide the processed human action video into RGB frames and calculates the horizontal and longitudinal optical flow field images corresponding to the video frames using the farneback dense optical flow method.The spatial flow feature extraction network extracts the spatial features of human actions from the RGB frames,while the frame sequence stream feature extraction network extracts the sequence features of the picture frame itself.The time flow feature extraction network extracts the temporal motion characteristics of human actions from the optical flow field images.The proposed model then uses a parallel feature-level fusion method to fuse the spatial flow,time flow,and frame sequence flow features.The multilayer perceptron is used as the action classifier to take the fused feature information as input and output the final action classification.Experimental results show that the proposed three-stream network method with the frame sequence feature extraction network achieves good results for human action recognition tasks on UCF11 and HMDB51 datasets.2.An improved three-stream network model for human action recognition is proposed to address issues with the traditional activation functions and feature fusion strategies.The proposed method replaces the traditional activation function with a new adaptive activation function called ACON(Activate Or Not)that can adaptively switch between linear and nonlinear expressions.The proposed method also replaces the parallel feature-level fusion strategy with a more flexible feature fusion method that can assign different weights to different features.Experimental results show that the proposed improved model with the ACON activation function and the flexible feature fusion method achieves better results than the original three-stream network model.3.To address the lack of research on infrared human action recognition and the shortage of infrared human action datasets in engineering applications,this paper presents a human action recognition dataset in infrared mode based on common indoor normal and abnormal behaviors.The dataset contains 7 normal behaviors and 5 abnormal behaviors,effectively covering common human actions that may occur in indoor scenes.Related experiments on the proposed three-stream network model and the improved model show that the proposed models have good feature expression ability in infrared mode,and the improvement strategy of the improved model is still effective and achieves excellent results.Figure [49] table [14] reference [117]... |