Font Size: a A A

Research On Human Action Recognition Method Integrating Visual Attention Mechanism And Deep Learning

Posted on:2020-02-13Degree:MasterType:Thesis
Country:ChinaCandidate:L ChenFull Text:PDF
GTID:2428330596478963Subject:Biomedical engineering
Abstract/Summary:PDF Full Text Request
Video human action recognition is one of the most important members in the field of computer vision,which is widely used in video surveillance,video retrieval and human-computer interaction.It has attracted the attention of many researchers.With the penetration and development of deep learning in various fields,researchers have applied deep learning methods to recognize and classify human action in videos,but most of the algorithms have unsatisfactory recognition results in complex background,multiple targets and more interference scenarios.In view of these situations,this paper revolves around the original intention of computer vision,simulating the information processing mechanism of human visual system,then proposes a human action recognition system which integrates deep learning and visual attention mechanism.Firstly,the spatial and temporal information feature extraction method is given by using the spatio-temporal convolution long short term memory ConvLSTM unit.ConvLSTM is a efficient combination of Convolutional Neural Networks and Long Short Term Memory Networks.It not only absorbs the advantages of Convolutional Neural Networks,which extracts spatial location features by simulating the attributes of visual system receptive field,but also retains the associative memory function of LSTM on long-term sequence problems.Therefore,this structure can extract spatial and temporal information in videos simultaneously in order to ensure motion information beyond video sequence frames not lost,thus improving the recognition performance of the system to a certain extent.Secondly,based on the basic principles and information processing process of human visual attention mechanism,AttenLSTM unit is established via combining deep learning attention mechanism with LSTM.The essence of the unit is to simulate human visual attention mechanism,paying attention to the visual information in a certain way,in order to scan each position selectively,then give more attention to the area where human action occurs.These attentions are expressed in vector form.Each element in the vector takes values between 0 and 1,and the values of each element are assigned to different parts at the same time.The weights of the positions are weighted and summed with the original feature vectors.Finally,the network framework is built by modifying the common encoderdecoder LSTM model.In view of the fact that the research object is video containing image information and time series information,this paper cannot complete the encoding and decoding process with ordinary LSTM.Therefore,the ConvLSTM structure is used to encode the spatio-temporal information contained in the video based on Darknet network model,and decoding stage uses the AttenLSTM structure.Due to its unique and effective weight distribution method,on the one hand,it can enhance the relevance of context information,on the other hand,it reduces redundant information processing.In order to prove the validity of this model,a series of performance test experiments were conducted on KTH database and UCF101 database.These two databases are very representative.The KTH database has simple background,less interference,and only six types of action.The UCF101 database has complex background,more occlusion interference and 101 types of action.Through the analysis of the experimental results,it can be proved that the proposed video human action recognition system which integrates the deep learning attention mechanism has a good recognition effect.
Keywords/Search Tags:deep learning, attention mechanism, human action recognition, convolutional neural network, long short term memory network
PDF Full Text Request
Related items