| Human action recognition has always been a hot research direction in the field of artificial intelligence.It has been widely used in smart home,virtual reality,smart security,video surveillance and other fields.However,in the case of low illumination,especially in some rural nights and relatively dark environments,dangerous behaviors such as crimes are more likely to occur.In such an environment,the surveillance camera cannot capture a clear picture,and the action recognition algorithm cannot play its role.some effect.In the process of unmanned driving,it is necessary to identify pedestrian behaviors and their trends in various environments and weather.Under low illumination,it is impossible to capture a clear picture,so the purpose of safe driving cannot be achieved.The current human action recognition algorithms are all for visible light images.Multispectral imaging technology can not only provide RGB images,but also provide valuable far-infrared images.The physical characteristics of its imaging enable it to achieve all-weather,low-interference detection,and it has broad development prospects in the fields of national defense monitoring and vehicle assisted driving.In order to solve the above problems,this paper firstly uses the multimodal data provided by the multispectral device to detect pedestrians in RGB images and far-infrared images.Since each subtask of skeleton-based action recognition is based on convolutional neural network,we study the pruning algorithm of convolutional neural network.Then,the human key points are detected in the pedestrian detection frame obtained in the first step.Finally,the skeleton sequence formed by the key points of the human body is used to identify the human behavior.The main research contents of this paper are as follows:1.Research on pedestrian detection network based on Transformer feature fusion and histogram layer.Two networks,FTHd(Day Network of Fusion Transformer and Histogram Layer)and FTn(Night Network of Fusion Transformer),are proposed for the characteristics of daytime and nighttime images.During the daytime,the texture features of RGB images are more obvious.We first add the histogram layer to the input branch of the detection network,and then perform the Concat operation on the features of different receptive fields.Finally,we add the cross-modal feature fusion method CFT module to the front end of the network to fuse and interact the features.Using the Transformer’s self-attention mechanism,the network robustly captures the potential interactions of RGB features and far-infrared features.The light at night is very weak,and the far-infrared image plays a key role,but the texture information of the far-infrared image is weak,and the two-stream features can be fully obtained through the VGG network,so we convolve the VGG Conv4-3 layer.The features of the two streams are combined into one stream,which not only improves the accuracy of nighttime dataset detection,but also greatly reduces the amount of network parameters.Finally,we add a CFT module to the front end of the VGG network for feature fusion,and perform intra-modal and inter-modal fusion at the same time.2.Research on generalized pruning algorithm of convolutional neural network.Each subtask of action recognition is based on convolutional neural networks,so we propose a generalized pruning algorithm for convolutional neural networks.First,for all feature layers of the network,the network features are pruned by the method of feature deconvolution visualization as a guide,and the contribution rate of each feature map of each layer of the network and the similarity between each pair of feature maps are calculated.Pruning feature maps with low contribution rate and high similarity.After pruning,the network parameters are fine-tuned,and after continuous pruning loop iterations,the final refined model is generated.On the basis of the speed improvement,the pruned optimized model also has a certain degree of improvement in accuracy.The feature deconvolution pruning method proposed in this paper is a general pruning method,which can be applied to other similar network pruning tasks,and is not limited by the type of network input image and network structure.3.Research on far-infrared human body key point detection.Human keypoint detection under low illumination is the main problem to be solved in this part.This paper innovatively uses far-infrared images to extract human key points,and proposes a new attention-oriented two-stage lightweight convolutional neural network LMANet.The network consists of two stages.The first stage uses a lightweight depthwise separable residual module to capture local details of keypoints without the need for complex multiple convolutional layers to capture image details.The second stage expands the receptive field of the image,and estimates the poorly recognized key points through the contextual relationship between the key points.Since there is no public dataset for far-infrared human body keypoint detection,we select 700 images from the public far-infrared pedestrian detection dataset,annotate human keypoints,and make them public for other researchers to use.4.The human body’s behavior is recognized by using the human body key point sequence extracted from the far-infrared image.The main work is divided into two parts:(1)First,the Conv-Shift-Conv(CSC)module is introduced into the network structure.Then,for the CSC module,it is proposed to replace the shift module in Shift-GCN with a more sparse shift module,which is named Sparse Shift-GCN.The proposed network reduces the redundancy of features,prevents overfitting,and improves the generalization ability of the model.Finally,OHEM Loss is introduced into the proposed model.The accuracy of the proposed model on 4 different streams is improved to varying degrees,which improves the overall performance of the network.(2)On the basis of Sparse Shift-GCN,it is proposed to set the number of input and output of each layer of the network to an integer multiple of joint points,that is,the integer multiple sparse network Int Sparse-GCN.Next,we researched and analyzed the mask function in Shift-GCN,and found that more than 80% of the mask function was ineffective.In view of the above problems,an automated traversal method was designed to obtain the optimized parameters with the highest accuracy. |