Font Size: a A A

Research On Behavior Recognition Based On Multimodal Fusion

Posted on:2022-02-04Degree:MasterType:Thesis
Country:ChinaCandidate:C S XuFull Text:PDF
GTID:2518306554950339Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
Human behavior recognition is a technology that allows computers to autonomously understand human behavior by modeling the spatial relationship between adjacent frames in the video.In the non-contact interaction between humans and computers and robot coaching tasks,robots are required to recognize human hand behaviors.However,current behavior recognition algorithms cannot accurately understand human hand behaviors,and cannot make good use of the connections between different behaviors in tags.Therefore,research on human hand behavior recognition algorithms is of great significance.In order to solve the problem of redundant frames in extracting video frames,a key frame extraction algorithm based on minimum structural similarity is proposed.In this algorithm,the structural similarity of the two frames are used to represent the redundant information between the frames,and the video frames with unobvious behavior are eliminated.To find a solution of the problem that the current video feature extraction model cannot fully utilize the relationship between video frames,a temporal aggregation algorithm based on local information propagation is proposed.The algorithm consists of a three-layer convolution structure,in which the first two layers of convolution have three different convolution kernel sizes,overcoming the over-smooth phenomenon occurring in multiple inter-frame information aggregation;the third layer is optimized by using the inter-frame motion excitation algorithm,recalculates the features of the video frame through the attention mechanism,and enhances the feature expression ability of continuous frames.In order to address the issue that the prior information in the label features cannot be directly used in the recognition model,the idea of multimodal fusion is introduced,and A classification algorithm based on low-rank bilinear fusion is proposed.The algorithm combines the feature vector of each label with the video feature vector for low-rank bilinear fusion,and calculates and classifies the video scores in different categories according to the fusion results of different categories.Finally,the public data set Something Something is used to test the algorithm.The test results show that the combination of the local information propagation algorithm and the low-rank bilinear fusion classification algorithm in Something SomethingV1 achieves a recognition accuracy of 45.14%,which increases by 9.7%,6.24%and 1.78%,respectively,compared with other three models in this article;The recognition accuracy rate of Something Something V2 reaches 54.99%,which increases by 6.14%and 3.82%,respectively,compared with other two models in the article.The results show that the improved information dissemination algorithm can obtain better behavior characteristics,and the classification algorithm based on low-rank bilinear fusion can improve the accuracy of classification.The two algorithms described have certain reference value for human hand behavior recognition.
Keywords/Search Tags:Behavior recognition, Convolutional neural network, Information Transport, Multimodal fusion
PDF Full Text Request
Related items