Font Size: a A A

Action Recognition Based On Spatiotemporal Convolution Networks

Posted on:2021-05-02Degree:MasterType:Thesis
Country:ChinaCandidate:K YangFull Text:PDF
GTID:2518306308970169Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Recognizing actions in videos based on temporal information is a challenging problem in computer vision.Most of the previous methods,such as 3D CNNs(convolutional neural networks)and two-streams CNNs,only used features containing global temporal information as video representation,ignoring the importance of local temporal features.Action recognition based on the spatiotemporal convolutional networks can simultaneously analyze the spatial and temporal information,so as to effectively understand the intention of the action performer in videos.To solve such problems,we propose long and short sequence concerned networks(LSCN)based on temporal interaction perception module,which can combine different temporal information.LSCN makes use of the interactions of temporal features from different convolution layers to enhance the temporal structure and the representation of videos,as well as takes into account the needs of temporal information for long and short sequence actions.This paper proposes a new method of data augmentation,namely background enhancement.Background enhancement algorithm tries to use the background information of video,which is fused with the video frames to generate new data.New data generated by background enhancement algorithm is similar to the videos which are captured by cameras from long distance.This algorithm increases the data distribution which models in training stage can learn,as well as better optimizes the parameters and enhance generalization ability of models.Based on the characteristics of real-time video stream,this paper designs an online action recognition system and puts forward a variety of solutions according to the problems encountered in the practical application of the system.The results of experiments show that LSCN which is based on 3D ResNext101 and uses the background enhancement algorithm can be generalized well in two public datasets(UCF101 and HMDB51).Moreover,compared with the basic network,there are 0.5%and 3.1%accuracy improvements respectively.
Keywords/Search Tags:action classification, action recognition, temporal features, features fusion, background enhancement
PDF Full Text Request
Related items