| In recent years,the technology in the field of artificial intelligence has developed rapidly,and computer vision and other technologies have received widespread attention.Among them,human action recognition has become one of the research hotspots.It shows high application value in medical diagnosis,intelligent monitoring,and humancomputer interaction.For the research topic of human action recognition,researchers have made some research progress,but human action recognition technology in video still has many difficulties to be solved,such as serious occlusion of target objects in the area,complex video background,camera perspective,etc.A series of problems make it difficult to further improve the recognition accuracy.In the field of human action recognition,the traditional artificial feature extraction method has encountered a bottleneck.Mainstream deep learning method uses a convolutional neural network to simulate the human brain's understanding of video and image information,and extracts autonomous learning features,which greatly improves the recognition efficiency and accuracy.In order to further study the human action recognition method based on deep learning,the following work has been done:Aimed at the problem of mutual fusion and how to make full use of video time series information,a model of residual two-stream network and attention is proposed.The residual two-stream network fuses the temporal and spatial characteristics of the video to construct a Bi-LSTM model,making full use of the timing information of video frames.An attention model is introduced to assign different weights to the video frame sequence according to the output of the Bi-LSTM network at different times,and finally the Softmax function is used to complete the human action recognition task.Aimed at how to fully extract the spatio-temporal feature information of video,an attention-based spatio-temporal fusion network and a bidirectional one-way LSTM model are proposed.An attention mechanism is introduced on the basis of the spatiotemporal fusion network.In order to better capture global information,a two-simple To the LSTM model,the Softmax function is used to complete the task.Finally,the two models are trained and tested on the UCF101 and HMDB51 datasets,respectively,and the experimental results are analyzed.Results show that the models are very robust and both improve the accuracy of action recognition.Figure 39;Table7;Reference 48... |