| Action recognition is an important basic task in the field of video understanding,which has attracted more and more attention due to its extremely high application value.Existing mainstream action recognition methods are mainly used in the field of visible light video,while there are relatively few studies in the field of infrared video.However,in some special scenarios,it is difficult for visible light video to obtain better image quality for the limitat io n of its own imaging characteristics.For example,in a low-light environment,what is happening in the scene can be difficult to discern even by humans.Compared with visible light video,infrared imaging video has the characteristics of insensitivity to light environment and strong anti-interference ability,and can obtain better imaging quality in low light environment.Therefore,the research on infrared video action recognition has great practical significance and application value.In this context,this thesis combines the characteristics of infrared video and aims to the shortcomings of existing research methods to carry out research.The specific research work is as follows:Firstly,this thesis proposes an action recognition method combining low-rank decomposition and multi-stream fusion,aiming at the problem of ignoring the significa nce of global temporal information in existing infrared video action recognition methods.This method performs low-rank decomposition on the entire video to obtain the global temporal information of the video.By fusing the global temporal information obtained by the lowrank decomposition,the local temporal information obtained by optical flow extraction,and the rich spatial information of the original video,the model can make full use of the rich information in the video to get more accurate recognition results.Experimental results show that the method achieves 88.75% on the classic infAR dataset,surpassing the performance of existing multi-stream fusion methods.At the same time,this paper also has a performance improvement of about 3% compared with the benchmark method on the visible light video dataset,which shows the effectiveness and generalization of the method.Secondly,in this thesis,for the reason the irregularity of the action occurrence area in infrared video action recognition,the action occurrence area is smal er than the receptive field area of the convolution,and the features of interest in a small area extracted by the model from a large area are required to be relatively smal.In addition,multi-stream fusion needs to extract other modal data first,which leads to time-consuming and other problems.So,an action recognition method based on graph convolution is proposed.This method uses graph convolution to adaptively aggregate the features of the regions where the irregular spatiotemporal distribution occurs,and automatically aggregates the irregular ly distributed features in the spatiotemporal three-dimensional structure,so that the model can directly and adaptively focus on the features of the original video.The feature representation that is more related to the motion area can greatly improve the recognition accuracy of the model.At the same time,the method also proposes a feature aggregation module,which can weight the features of each graph node with different weights,and adaptively aggregate features that are more relevant to action to obtain more accurate recognition results.This paper verifies the effectiveness of the graph convolution model through experiments,reaching 91.75% on the classic infAR dataset,surpassing the existing methods of the current infAR dataset. |