Font Size: a A A

Research On Human Action Recognition Based On Multiple Semantics

Posted on:2022-07-15Degree:MasterType:Thesis
Country:ChinaCandidate:J F ChenFull Text:PDF
GTID:2518306605977459Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
At present,human action recognition methods mainly rely on visual features for recognition.Visual features refer to the features extracted directly from the video,such as the color of a frame in the video,the posture of a person,the shape of an object,etc.There is no clear semantic information in the visual features,and the lack of semantic information will cause the computer and the user to understand the information in the video inconsistently,which leads to semantic gap.To bridge the semantic gap,semantic information is particularly important.Therefore,according to different semantic information,this thesis proposes three semantic-based action recognition methods.The main contents are as follows:(1)An action recognition method based on the alignment of video and text is proposed.To solve the problem that the semantic pose sequence doesn't use real semantic information in the existing methods,and the alignment of the visual pose is only related to the previous transition state,an unsupervised discriminant alignment model(UDAM)is proposed.The model learns the alignment between the semantic pose sequence and the visual pose sequence through a recurrent neural network.Since there is no prelabeled standard alignment pair,the noise contrast estimation algorithm is used to realize the unsupervised training of the model.Under the trained model,the action classification problem is transformed into a given visual pose sequence to find the semantic pose sequence with the highest alignment score.The model performance was verified in the MSRC-12 and WrkoutUOW-18 datasets.(2)An action recognition method fusing an iterative network of labels is proposed.Due to the problem of error delivery in non-end-to-end networks,and the action labels cannot be used directly in the testing phase.Therefore,assuming that noisy action labels help action classification,an action recognition method fusing an iterative network of labels(Iter-Net-Labels)is proposed.This method is an iterative application of the previous inference classification result,and assumes that the noise in the action label can be corrected by multiple iterations.Specifically,the proposed Iter-Net-Labels network improves the traditional convolutional bidirectional Long Short-Term Memory network,adding the action label of the previous iteration to the feature representation of the next iteration.Experimental results show that the accuracy of this action recognition method on the MSRC-12 dataset is increased by nearly 14%at the highest and 1.26%at the lowest.On the WorkoutUOW-18 dataset,the accuracy of action classification is increased by 44%at the highest and 0.29%at the lowest.(3)An action recognition method fusing an iterative network of texts is proposed.Since the action labels cannot explain how the action is executed,to make full use of the semantic information of the action,combining the task of generating text description of the action video with the action recognition task,using the text description of the action or the generated result of the semantic posture sequence to guide the action recognition,the action recognition method fusing an iterative network of texts(IterNet-Texts)is proposed.The Iter-Net-Texts network improves the Convolutional Encoder-Decoder Architecture model.The text description of the action(or semantic pose sequence)of the previous iteration is added to the feature representation of the next iteration.Experimental results show that the accuracy of action classification on the MSRC-12 dataset is increased by 18%at the highest and 1.07%at the lowest.On the WorkoutUOW-18 dataset,the accuracy of action classification increased by 3%.
Keywords/Search Tags:Action recognition, Iterative network, Semantic pose, Action labels, Text description
PDF Full Text Request
Related items