Font Size: a A A

Research On Two-stream Neural Network Based Human Action Recognition

Posted on:2020-08-25Degree:MasterType:Thesis
Country:ChinaCandidate:M Y LiFull Text:PDF
GTID:2428330602951902Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
In recent years,human action recognition has become a popular field of video surveillance and human-machine interaction.Traditional human action recognition methods mainly construct hand-crafted features,and extract hand-crafted descriptors from video for human action recognition.However,hand-crafted features are usually not abstract enough and have limited ability to describe complex behavior video.With the development of deep learning in recent years,action recognition methods based on deep learning have attracted wide attention from scholars at home and abroad.Compared with the traditional hand-crafted features,the deep neural network with strong self-learning ability can abstract the deep information inside the data.The method based on two-stream neural network is one of the key points in the research of human action recognition based on deep learning.At present,the recognition task based with two-stream neural network usually focus on static stream and dynamic stream.In this paper,the dynamic stream and the static stream in the two-stream neural network are studied.The dynamic stream obtains the motion information by abstracting the optical flow and other motion information in the video.The static stream obtains the static information by abstracting the static information in the video frame.In this paper,a multi-level temporal attention dynamic stream network is studied,which includes short-term network,mid-time network,long-term network and temporal network.Short-term network takes optical flow between two consecutive frames as input,aiming at capturing short-term motion information between consecutive frames,and constructs a short-term attention mechanism to highlight the contribution of more important short-term information during whole video.Mid-time network takes the optical flow stack of continuous multi-frames as input,aiming at capturing the mid-term motion information,and at the same time constructs the attention mechanism of mid-term,highlighting the contribution of more important mid-time information.The long-term network is oriented to the whole video sequence.LSTM is used to model the video feature sequence for a long term to obtain the temporal information of the whole video.At the same time,a long-term attention network is constructed by introducing long-term dependence loss.Through the mechanism of long-term temporal attention,network can better represent the long-term motion information of the whole video.Finally,the short-term network,the mid-time network and the long-term network are combined.The final motion features not only combine the short,medium and long term motion information,but also focus on the motion information of more important positions in video.A static stream network is proposed to guide attention in order to capture static information in action video better.Guided attention static stream network extracts more complete information about video frames and combines global features of high-level and local features of low-level.Furthermore,global attention and local attention are constructed separately to highlight the deep characteristics of important regions.By combining global attention,local attention is reduced to the background noise.At the same time,by introducing the guided attention constraint,the focus of local attention is consistent with that of global attention.On the other hand,by introducing the complementary classification loss,the global and local features highlight the complementary part.Static stream mines the similarities and differences of global and local features respectively,and ultimately get more complete and discriminant deep features.In summary,the dynamic stream network and the static stream network in the two-stream neural network for human action recognition are studied in depth in this paper.Multilevel time attention dynamic stream network and multilevel guided attention static stream network are proposed respectively.In this paper,two challenging standard datasets are experimented with the proposed method,and the effectiveness of the proposed method is verified.
Keywords/Search Tags:Two-stream Network, Deep Learning, Attention Mechanism, Human Action Recognition
PDF Full Text Request
Related items