Font Size: a A A

Research On First Person Action Recognition Algorithm

Posted on:2022-03-17Degree:MasterType:Thesis
Country:ChinaCandidate:G C ZhangFull Text:PDF
GTID:2518306512971949Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
Smart wearable devices have developed rapidly in recent years,and various wearable cameras are widely used in many fields,providing a large amount of egocentric video data.The activity recognition in first person vision has gradually attracted more researchers' attention.Data analysis of the egocentric video can realize realtime monitoring and status evaluation of the wearer,which has important applications in medical and health,virtual reality,smart home and other aspects.The first person vision provides a new perspective for analyzing the interaction between people and things.In egocentric videos,the wearer's posture information cannot be obtained,and the wearer's head movement will cause the captured video to shake greatly,the viewpoint changes greatly,and the scene is complex and changeable.All of these pose great challenges to egocentric activity recognition.In recent years,deep learning has made great progress in the field of computer vision research.Based on deep learning technology,this paper conducts research on short-term fine-grained activity recognition from the first person vision.In view of the fact that the video sampling sequence can not contain rich interactive object information,this paper proposes a video sampling method based on the change state of the operating object in the egocentric activity and performing segmented sampling,so that the network can extract more operating object information.At the same time,in order to solve the problems of occlusion and similarity of operation objects,based on the use of class activation maps to locate operation objects,two related feature extraction methods,neighborhood method and grouping method,are proposed.By introducing associated features,the ability to express features of operating objects is improved,thereby enhancing the ability of the network to discriminate operating objects.Aiming at the problem of insufficient feature fusion and the introduction of redundant information in the extraction of associated features,multi-feature fusion networks,namely neighborhood fusion network and packet fusion network,are designed to fully integrate features and effectively utilize the complementary advantages of multiple features,and then improve the performance of network activity recognition.In order to further improve the model's recognition performance and generalization ability for egocentric activities,this paper draws on the idea of multi-task learning,and designs a multi-task learner for neighborhood fusion network and packet fusion network.For the neighborhood fusion network,the action and operation object tags are introduced as auxiliary task supervision signals to improve the network's ability to recognize actions and operation objects,thereby improving the behavior recognition performance of the network.For the packet fusion network,four independent classifiers are designed to learn multiple sets of features,and then the results are integrated to improve the overall recognition ability of the network.Through comparative experiments on the standard egocentric daily behavior activities dataset,the effectiveness of the method in this paper is verified.
Keywords/Search Tags:First person vision, Activity recognition, Key frames extraction, Multi-feature fusion, Multi-task learning
PDF Full Text Request
Related items