Font Size: a A A

Research On Image-based Action Recognition Based On Context And Feature Fusion

Posted on:2022-01-06Degree:MasterType:Thesis
Country:ChinaCandidate:J L YuFull Text:PDF
GTID:2518306509954699Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In computer vision task,the research on image-based human action recognition is a very challenging work.Due to the lack of temporal-domain information and the interference of human posture,scene and illumination,how to effectively extract spatial cues from the image to represent human behavior is the focus of image-based human action recognition research.In recent years,with the rapid development of deep learning,convolutional neural network as the representative of deep learning models has achieved great success in the field of computer vision,which further promotes the development of image-based action recognition research.Based on the deep learning method,the thesis studies the image-based human action recognition models,and proposes two different ways to solve the problems of the current action recognition methods.The details are as follows:1.Most of the current action recognition methods only focus on human appearance features information,but ignore the important role of context information in action recognition task.Therefore,this thesis proposes a contextual multi-branch attention networks.The model is mainly composed of target human branch network,region-level attention branch network and scene-level attention branch network.The region-level and scene-level attention branch networks capture the local contextual information and global contextual cues related to human behavior in the image by using attention mechanism,so as to provide more rich contextual features for the model.In order to further obtain the contextual features near the target human,a context convolution module is added to the target human branch network to provide more supplementary information for human appearance features.Finally,the weighted fusion method is used to obtain the final prediction results for the three branch networks.Compared with other traditional and deep learning methods on PASCAL VOC 2012 Action and Stanford 40 Actions datasets,the effectiveness of the approach in human action recognition task is proved.2.To solve the problem that some action recognition methods need additional auxiliary information(human key-points,object bounding boxes,etc.),and these models are difficult to be widely utilized,this thesis proposes a part features fusion networks without auxiliary information.Under the supervision of image-level labels,the networks can adaptively fuse three different kinds of parts features,which are easy parts,hard parts and background parts.The model is mainly composed of the attention module and two-level classification networks.The two-level classification networks consist of part-level and image-level loss functions.The part-level loss function allocates different weights to different parts through the adaptive mechanism,and the image-level loss function is to fuse and retrain the important parts features.At the same time,in order to effectively extract more implied information in the image,the attention module including a classification loss function is added to the backbone to further enhance the feature expression ability of the networks.In order to verify the effectiveness of this approach,a large number of experiments are carried out on PPMI,Stanford 40 Actions and PASCAL VOC 2012 Action datasets.The experimental results show that the method has the best performance on the PPMI dataset,and the performance is better than most of the current action recognition models.
Keywords/Search Tags:action recognition, deep learning, contextual information, feature fusion, attention mechanism
PDF Full Text Request
Related items