Font Size: a A A

Research On First-view Video Action Recognition Technology Based On Multi-feature Fusion

Posted on:2021-09-27Degree:MasterType:Thesis
Country:ChinaCandidate:H Y JiangFull Text:PDF
GTID:2518306512479054Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the emergence of cost-effective intelligent cameras and the rapid development of video social platforms,videos which are recorded from the first-person perspective are constantly flooding into people's lives in recent years.The research in egocentric vision has enormous potential applications,and the first-person action recognition serves as the cornerstone of video analysis among them,which has received increasing attention from the academic and community.However,the exploration of the field of action recognition is still at the preliminary stage in egocentric videos,and only a few theoretical studies focus on it currently.It is significantly different from the third-person videos in terms of visual content,and heterogeneous in nature.In this paper,the main works are summarized as follows:Firstly,a cross-feature fusion architecture is designed for egocentric interactive scenario.In this architecture,global-local branches are utilized to model the motion of diverse participants,and each branch deploys multimodal multi-stream C3 D networks to extract complementary spatiotemporal representations.The cross fusion is leveraged to eliminate redundancy and establish effective linkages between the two branches,which leads to a significant improvement in the accuracy of first-person interaction recognition.Secondly,a two-stream attention3 D feature fusion network is proposed for egocentric daily activities scenario.In the network,the 3D attention module is applied for feature maps to suppress noise in spatiotemporal clues,while the modal attention module is applied for feature vectors to explore the importance of each modality.Ablation experiments are implemented to reveal the effectiveness of the designed modules,which shows that the proposed algorithm is capable of acquiring more discriminative feature representations.Finally,a first-person action recognition system is designed and implemented.The system encapsulates multi-features fusion algorithms,which enables users to configure data models and perform feature fusion in an interactive manner.Additionally,the intermediate results are fed back to the user interface to show the efficiency and recognition performance of the algorithm intuitively.
Keywords/Search Tags:Egocentric videos, action recognition, multi-modalities, multi-features fusion
PDF Full Text Request
Related items