Font Size: a A A

Research And Implementation Of Key Techniques For Human Action Recognition Based On Deep Learning

Posted on:2021-04-24Degree:MasterType:Thesis
Country:ChinaCandidate:Y F LiFull Text:PDF
GTID:2428330611465564Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Human action recognition,especially video-based human action recognition,is one of the hot research areas in the field of computer vision in recent years.It is widely used in intelligent monitoring,intelligent security,virtual reality,human-machine interaction and cooperation,etc.Thus,it has high research value and extensive application prospect.This thesis studies video-based human action recognition from deep neural networks structure,feature fusion and model combination respectively,and verifies them on the two open source datasets:UCF101[23]and HMDB51[22].The contribution of this thesis can be summarized as following three aspects:?1?The human action recognition based on deep neural networks.2D convolution cannot extract temporal and spatial features at the same time.The parameters of 3D convolution are too large and difficult to train.Aiming at these problems,the thesis introduces 3D residual structure,and designs 3D residual model.In order to capture association features among several consecutive frames,the thesis introduces 3D attention mechanism,and captures the global association feature through assigning different attention value to adjacent frames.The experiment demonstrates that both of the two structures improve performance of recognition.Aiming at the excellent performance of 3D residual and 3D attention mechanism,the thesis adopts two fusion strategies to fuse the two structures,and produces two new structures correspondingly.The experiment also shows that the new structures have better performance than former single structure.?2?The human action recognition based on feature fusion.In consideration of decoupling feature extraction layer into superficial feature layer and deep feature layer,which represent different particle sizes of a same kind of features,the thesis adopts addition and concat to fuse features.The experiment indicates that fusion of features on superficial feature layer through the two strategies strengthen the representation of human action feature.In order to improve the recognition accuracy of the model,the thesis uses Farneback[81]to extract optical flow from RBG picture,extracts the superficial optical flow feature and then uses addition fusion and concat fusion to fuse RGB superficial features based on contribution degree.The experiment shows that the fusion of optical flow feature is better than fusion of superficial feature in performance,and the adding fusion strategy based on feature contribution degree shows the best performance.?3?The human action recognition based on model fusion.Aiming at the three models designed by the thesis:3D residual model,3D attention mechanism and 3D attention residual model,the thesis raised mean and weighted fusion strategies.Weighted fusion uses model weight calculating method to assign high-accuracy model higher fusion rate.The experiment indicates that the two fusion strategies bring various degrees of improvement in performance,and the method of model weight fusion brings average 3%improvement.
Keywords/Search Tags:Human Action Recognition, 3D Convolution, 3D Attention Mechanism, Multimodal Fusion, Model Fusion
PDF Full Text Request
Related items