Font Size: a A A

Research And Implementation Of Video Action Recognition Based On Long-Time Feature Fusion And Attention Mechanism

Posted on:2020-04-29Degree:MasterType:Thesis
Country:ChinaCandidate:R Z HuanFull Text:PDF
GTID:2428330590995744Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of computer vision technology in recent years and the sharp increase of various video data,visual analysis based on video data has become a research hot.At present,the main convolutional neural network model has limited ability to model the long-term video data,and the behavior recognition method usually uses the average sampling strategy to convert long-term video data into a few frame images,which inevitably causes information loss.On the other hand,there are a large number of redundant image frames in the video clip,which can increase the computational complexity if processed indiscriminately.Therefore,how to retain the key information of the video sequence and reduce the amount of processed data is a difficult problem to be solved in the video-oriented behavior recognition.To this end,this thesis proposes a long-term feature fusion method based on attention mechanism to improve the effectiveness and accuracy of video behavior recognition.The main work of this thesis is as follows:(1)This thesis introduces and analyzes the commonly used video behavior recognition algorithms,introduces the research status of the shallow feature method and the depth feature algorithm,and reproduces the classic video behavior recognition algorithm.The algorithm identifies the accuracy and analyzes the advantages and disadvantages of the existing recognition algorithms.(2)The feature learning method of video behavior is proposed.After comprehensively analyzing the advantages and disadvantages of current algorithms,a two-stream network model is adopted,which applies the two-stream network of RGB images and optical flow to the field of video behavior.In order to further improve the accuracy of behavior recognition and make up for the information loss caused by limited video frames in the two-flow model,this thesis uses long-term video information which divides long-term video frames into several overlapping segments to reduce the loss of sampling video frames.In addition,due to the existence of redundant information in consecutive video frames,an attention mechanism is used to assign different weights to consecutive frames of the video,thereby weighting the influence of different frames on the determination result and making more rational use of the messages in the video frame to improve the accuracy of classifying video behavior..(3)Based on the LSTM(Long Short-Term Memory)network,the RGB frame and optical flow frame selected by the above attention mechanism are spatio-temporal modeled,so that the algorithm can simultaneously capture the spatial information and temporal information of the video behavior,and improve the recognition accuracy of the algorithm.In the identification stage,the paper uses the spatial and temporal network judgment results of the overlapping video segments to fuse and generate the spatio-temporal scores for the video,and then combines the spatio-temporal scores to obtain the final video classification.Finally,a lot of experiments on this algorithm were carried out on two databases,UCF101 and HMDB51.The experimental results are compared with the current main behavior recognition methods to verify and the effectiveness and superiority of the proposed method.
Keywords/Search Tags:video action recognition, optical flow, LSTM, attention, spatial-temporal information fusion, long-term
PDF Full Text Request
Related items