Research On Action Recognition Based On Multimodal Feature Learning

Posted on:2021-05-05

Degree:Master

Type:Thesis

Country:China

Candidate:M M Cui

Full Text:PDF

GTID:2428330629452687

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Video-based action recognition is a hot topic in the field of computer vision.It has a wide range of applications in video surveillance,human-computer interaction,video information retrieval,and intelligent driving.With the explosive growth of various video data on the Internet in recent years,how to achieve effective and intelligent understanding and analysis of video data is critical.Traditional methods using only artificial feature extraction have many limitations and are not suitable for massive video data.Deep learning methods,especially deep convolutional neural networks,have made great progress in this area.The goal of action recognition problem research is to recognize and understand the actions of people in the video,and output corresponding labels.In addition to the spatial information existing in the two-dimensional image,the actions in the video data also increase the timing information of the action information.Due to the complexity of behaviors,changes in perspective,background noise and other objective factors.How to efficiently,accurately and comprehensively extract the spatiotemporal characteristics of action and design a reasonable and effective network structure is still a challenge.In order to solve the above problems,this paper designs a network based on multimodal feature learning for action recognition in video.The traditional two-stream method extracts spatial features from RGB images and extracts time-series features from optical flow.However,in this method,the time dimension information can only be extracted manually.Therefore,in order to more fully extract the spatiotemporal features,this paper adds an improved three-dimensional residual convolution neural network on the basis of the two-stream method,and combines the spatial features learned by the two-dimensional spatial network,the time-series features learned by the two-dimensional temporal network,and the time series features of 3D network learning are weighted and fused with category scores.Based on the idea of modeling remote time structures,it avoids a large amount of spatiotemporal information redundancy through sparse sampling.In the three-dimensional residual convolutional neural network,the solution of the 3�3�3 convolution integral is 1�3�3 and 3�1�1 convolution,which is equivalent to adding a one-dimensional pair on the basis of the two-dimensional convolution.The extraction of time information,and the use of global average pooling instead of fully connected layers,effectively reduces the amount of model parameters.Using this method of learning multi-modal features can effectively improve the recognition performance of the model.In this paper,experiments are performed on two commonly used data sets(HMDB-51 and UCF-101).Network training is performed through methods such as data augmentation and cross-input pattern pre-training to reduce the risk of overfitting the model.The experimental results show that the method proposed in this paper can effectively improve the recognition accuracy and has a good recognition effect on two data sets.

Keywords/Search Tags:

Action recognition, Deep learning, Multimodal features, Convolutional neural network

PDF Full Text Request

Related items

1	Human Skeletal Action Recognition Based On Deep Learning
2	Action Recognition Based On Deep Learning
3	Research On Action Recognition Based On 3D Convolutional Neural Network
4	The Research Of Action Recognition Based On Deep Learning
5	Research On Human Action Recognition Method Based On Deep Learning
6	Human Action Recognition Based On Deep Learning
7	Deep Feature Modeling For Human Action Recognition And Detection
8	Research And Applications Of Image-text Multimodal Correlation Learning
9	Action Recognition Method Based On Sparse Auto-Combination Spatio-Temporal Convolutional Neural Network And Its MapReduce Implementation
10	Study On Human Action Recognition In Videos Based On Convolutional Neural Network