| Action recognition is an important branch of computer vision.In recent decades,the performance of computer hardware has been greatly improved.With the unremitting efforts of many researchers and engineers,action recognition technology has been rapidly developed.The video data with exponential growth every day urgently needs the action recognition algorithm with high accuracy and robustness.Its rapid development and implementation can bring great convenience to people's life.In this paper,two improved algorithms with different structural are proposed for the purpose of extracting robust features and improving the accuracy of action recognition.The innovations of this paper are as follows:(1)A mixed convolution residual structure is proposed for action recognition.It can be used to solve the problem of two-dimensional convolutional neural networks are unable to model temporal information and full three-dimensional convolution network is computationally intensive and time consuming when they are applied to video action feature extraction,which leads to the low accuracy of action recognition.Firstly,two-dimensional convolution is performed on the input video frames one by one to perform appearance feature modeling.Then,one-dimensional convolution of temporal is performed on the feature map sequence extracted by two-dimensional convolution,and motion information modeling is performed.Finally,A three-dimensional convolution is performed on the sequence of feature maps of the one-dimensional convolution output,and spatial and temporal modeling is performed simultaneously.Taking into account the timeliness of two-dimensional convolution,the ability to extract excellent features by absorbing three-dimensional convolution.The experimental results show that the proposed hybrid convolution residual structure significantly enhances the transmission of time domain information,and can extract more complete and robust video behavior features.The accuracy of UCF-101 dataset is over 90%.(2)In the case of not paying too much attention to the time complexity of the algorithm and improving the recognition accuracy of the algorithm,an improved two-stream network structure is proposed for action recognition;on the one hand,the number of layers of the original dual-stream network is deepened;On the one hand,the two-dimensional convolution performed by the original two-stream network time recognition stream branch on the optical flow image is replaced by three-dimensionalconvolution;finally,the average fusion method is used to fuse the SoftMax output vector of the two-branch network as the final output of the network.The experimental results show that the proposed method is better than the classification performance of most two-stream network variants.Because the network layer is deeper and the time recognition stream uses three-dimensional convolution to extract motion information,the improved two-stream network extracts more complete features and a better classification ability,the recognition accuracy rates on the UCF-101 and HMDB-51 data sets were 96.1% and77.2%,respectively. |