Font Size: a A A

Human Action Recognition Based On Multi-mode Feature Fusion

Posted on:2020-03-25Degree:MasterType:Thesis
Country:ChinaCandidate:J YangFull Text:PDF
GTID:2428330578460292Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
In recent years,human action recognition in videos has become an important research topic in the field of computer vision.It has a wide range of applications in the fields of behavior detection,intelligent video surveillance and medical treatment.At the same time,the motion in the video is affected by the illumination and the angle of view;and there are also interference factors such as large intra-class differences between the same categories,which makes the research of human action recognition in the video more difficult.Academics need to constantly research new methods to find representative features to identify the action in the video,which is one of the key points of action recognition.More difficult than image classification is that video classification not only captures the appearance characteristics of the space,but also captures the time information between the sequences of individual video frames.In view of the success of deep learning methods in the field of image processing in recent years,many researchers have also studied spatial and temporal characteristics based on deep learning.However,a single feature pattern does not achieve the best results in video action recognition.In this regard,a multi-mode feature fusion method is proposed to deal with this task.The main contributions of the article is threefold:(1)The article conducts extensive research based on the subject of human action recognition,summarizes the research status of this topic at home and abroad,finds some problems of single features in the study of predecessors' methods,and proposes the idea of multi-mode feature fusion.(2)In order to solve the shortcomings such as lack of representation of single features,a human action recognition method based on deep network and feature fusion is proposed.This method uses convolutional neural network(CNN)and longshort term memory network(LSTM)to extract temporal and spatial features of video sequences;In addition,a channel is added to preprocess the video sequence by image adaptive threshold binarization method and XOR method,and then CNN is used to extract global motion features and spatiotemporal features.This method is evaluated on a public dataset.The results show that the two features are complementary,and their fusion makes the feature more representative.The result on UCF50 is 7.3% higher than the single spatiotemporal feature.(3)Due to the lack of representation of global motion information in detail and background information,a human action recognition method based on multi-mode feature fusion is proposed.In the above structure,the context information between two frames of the video is added as the third channel,thereby forming a three-channel multi-modal feature fusion structure.Finally,the fusion features is evaluated on the UCF50 dataset,which is 5.1% higher than the previous one,indicating that the correct use of the complementarity between the multimodal features can effectively improve the representation of the fusion features.
Keywords/Search Tags:Human action recognition, Feature fusion, Convolutional neural network, LSTM
PDF Full Text Request
Related items