Multi-view Feature Learning Based On Skeleton And Image Data And Its Application In Behavior Recognition

Posted on:2021-03-24

Degree:Master

Type:Thesis

Country:China

Candidate:D S Guo

Full Text:PDF

GTID:2428330614465976

Subject:Electronic and communication engineering

Abstract/Summary:

PDF Full Text Request

With the development of the discipline of artificial intelligence and the improvement of computing power,research on human behavior recognition based on deep learning has become one of the hot research topics,and it is also a very difficult research difficulty.Because human behavior recognition technology has a wide range of applications in human social life,it is of great practical value to carry out human behavior recognition research.Existing behavior recognition methods usually only use single-modal data such as images or skeletons.The image or video contains intuitive scene information,but it is easily affected by lighting and occlusion.Skeleton node data represents the three-dimensional coordinates of human joint points in the video frame,including the spatial structure information of the skeleton and the dynamic information of the timing.At the same time,the skeleton node data can well avoid occlusion and complex background interference,but the skeleton data lacks appearance Details.Therefore,there is a high degree of complementarity between image and bone data.In this paper,multi-view feature learning is performed from the two modal data of skeleton and image,and the complementary information of the two modal data is combined to improve the accuracy of behavior recognition.According to the characteristics of the two data of the image and the skeleton,the deep neural network model suitable for the characteristics of the two data is studied separately.For continuous video frame data,because the video can be decomposed into image data and optical flow data,we use a dual-stream convolutional neural network architecture to extract the spatiotemporal information of the video,but the traditional dual-stream network cannot learn the long-term spatiotemporal space of the human body in the video feature.In view of the shortcomings of dual-stream convolutional neural networks,this paper proposes a convolutional recursive fusion method.This method uses a recurrent neural network to model long-sequence video frames,extract the long-term dependency of the video frames,and at the same time combine the convolution operation with the recurrent neural network architecture to fuse the spatiotemporal features of the dual-stream network output and make full use of the image.Complementarity with optical flow to learn long-term human movement characteristics in video.In addition,this paper also proposes an RNN attention mechanism to allow the network to learn to focus on areas related to human behavior at different moments.For skeleton data,graph convolutional networks aremore suitable for modeling such non-Euclidean data.The joint points of the human body are connected to form an irregular undirected graph.The graph convolution network can extract and combine the local features and time series features of the key point sequence space of the human body.In order to enable the algorithm to capture the long-term human motion characteristics of behavioral video,it can also combine posture and joint information to improve the recognition accuracy of the algorithm.This paper proposes an efficient dual-stream network for feature learning of skeleton and image data.Due to the huge parameters of end-to-end training for large-scale neural networks,it is difficult to train and converge.In this paper,we first train the convolution recursive fusion network and the graph convolution network,and finally fuse the scores of the two to reduce the network.Difficulty of training and parameter adjustment,so as to improve the overall accuracy.Using UCF101 and HMDB51 two behavior recognition databases to test this method,compared with the current mainstream video behavior recognition,the effectiveness of this method is verified...

Keywords/Search Tags:

Behavior recognition, recurrent neural network, attention mechanism, graph convolution, multi-view feature extraction

PDF Full Text Request

Related items

1	Research On Sensor Activity Recognition Based On Improved Deep Recurrent Neural Network
2	Research On Emotion Recognition Based On EEG
3	Research And Application Of Text Event Extraction Methods
4	Action Recognition Based On Convolution Recurrent Neural Network With Attention Mechanism
5	Research On Graph Convolution Neural Network Based On Multi-attention Mechanism For Human Action Recognition
6	Research And Design Of Bitcoin Transaction Behavior Analysis System
7	Hand Gesture Recognition Method Based On Recurrent Three Dimensional Convolutional Neural Network And Attention Mechanism
8	Research On Sensor Activity Recognition Based On Recurrent Neural Network
9	Sentiment Analysis And User Behavior Research Based On Online Reviews
10	Research On Name Entity Recognition And Relationship Extraction Based On Attention Mechanism In Biomedical Text