Font Size: a A A

Deep Transfer Learning For Action Recognition

Posted on:2021-01-14Degree:MasterType:Thesis
Country:ChinaCandidate:Y N MaFull Text:PDF
GTID:2428330602977693Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Action recognition refers to the video classification task which infers the action category according to the video content.As a research field of video understanding,action recognition spanning perception and cognition,plays an irreplaceable role in anomaly detection,human-computer interaction,video retrieval and other tasks.Because of the uncertainty of motion speed,starting and ending time,appearance and posture of human motion in the video,and the interference of physical factors such as light,angle of view and physical occlusion,to make a good spatiotemporal modeling of action motion has become a very challenging task.The selection of categories in mainstream action datasets is arbitrary,and which leads to the repetition of categories in the same dataset and the difference of visual similarity between categories.Due to the high redundancy of spatiotemporal convolution and the lack of labeled samples in existing datasets,the imbalance of the separability of categories has a serious impact on classification performance.However,the existing methods mainly focus on how to extract accurate and efficient motion spatiotemporal feature,ignoring the rationality of task definition.The purpose of few-shot learning task is to alleviate the problem of large sample dependence in machine learning.Human beings can use the knowledge learned in the past effectively,so they can learn quickly in new task scenarios with only a small number of labeled samples.The diversity of video content and the abstraction of actions make it very difficult to extract the motion features in the few-shot action tasks.In order to improve the accuracy of video features,the existing methods introduce temporal feature fusion module in the classification.However,there is no explicit constraint on the distribution of intra-class features.This paper aims at the problem of insufficient training samples caused by the high complexity of action recognition task and the difficulty of annotation data acquisition.The semantic information of action tags is used as a prior knowledge to guide the visual classification by transfer learning.Specifically,the main contributions of this paper are:(1)This paper proposes a cost-free hierarchical action recognition method based on the transfer of semantic feature,in order to mine and utilize the inherent natural structure between categories.First,we extract the label semantic feature for deep digging the relation between action categories,and quantify the distinguishability between them.Through the utilization of hierarchical loss function and maximum mean difference loss function,the semantic association is used as prior knowledge to constrain the learning of visual features and optimize the rationality of classification tasks by feature-based transfer learning.Experiments show that our method can be combined with different action recognition networks to improve the performance of the model,and achieve state-of-art experimental results on well-known datasets.(2)This paper proposes a triplet loss function based on the distribution of label semantic features,which directly constrains the distribution of visual features by model-based transfer learning.We introduce the triplet loss into few-shot action classification for the first time.Based on the semantic features of action tags,the similarity between the categories is measured to help the reasonable selection of negative samples.It not only avoids selecting the most difficult negative samples every time,but also restricts the complexity of negative samples to avoid invalid training.Our method complements the classification loss perfectly,and can be combined with different methods for the measurement of the similarity of temporal feature flexibly and effectively,which greatly improves the classification performance.In summary,this paper presents a cost-free transfer learning method to fully exploit the relationships between action categories of current action recognition datasets,in order to guide and restrain the learning of visual features.It makes a more systematic study of the action recognition method based on deep transfer learning,quantifies the performance improvement brought by the introduction of additional knowledge,and provides a more detailed guidance and reference for the follow-up study of intelligent video understanding task which is based on the effective fusion of perception and cognition.
Keywords/Search Tags:Action Recognition, Transfer Learning, Hierarchical Classification, Few-Shot Learning
PDF Full Text Request
Related items