Font Size: a A A

Research On Action Recognition In Skeleton Sequences-videos Based On Deep Learning

Posted on:2021-02-18Degree:MasterType:Thesis
Country:ChinaCandidate:D TianFull Text:PDF
GTID:2428330614956685Subject:Aerospace and information technology
Abstract/Summary:PDF Full Text Request
With the rapid development of computer vision,action recognition becomes an important research filed and it has attracted more and more researchers' interest.Nowadays,human action recognition technology has wide application space and huge development prospect in the fields of transportation,medicine,sports,education,virtual reality and security monitoring.According to the type of input data,action recognition can be divided into the motion recognition based on skeleton sequences and the motion recognition based on videos.Skeleton sequences record the information of human joints in continuous time,such as the threedimensional spatial coordinates.In this thesis,the two tasks are studied in detail.Combined with the performance of the mainstream algorithms,the corresponding improved algorithms are proposed respectively.The main contents of this thesis are shown as follows:1.In terms of action recognition based on skeleton sequence,this thesis is inspired by human visual mechanisms and discusses the importance of attention and co-occurrence feature.On the basis of the spatial temproal graph convolutional network(ST-GCN),the attentional branch and the co-occurrence feature learning branch suitable for it are proposed to form a multi-task framework for action recognition.Experiments show that the performance of proposed algorithm is far superior to ST-GCN and other mainstream algorithms.Ablation experiments also demonstrate and explain the effectiveness of each branch.2.In terms of video-based action recognition,we addresses the problem that two stream convolutional networks require pre-calculation of optical flow,which leads to slow calculation speed.In this thesis,a combination strategy using motion filter and random cross-frame fusion is proposed to replace optical flow.Specially,the motion filter models the temporal relationship of actions,and random cross-frame fusion further extracts advanced temporal features.Experiments show that the proposed network is faster than the two stream convolutional network.And in terms of recognition accuracy,it is competitive with the current alternative optical flow algorithms.3.Human pose estimation is closely related to action recognition.This thesis summarizes the bottom-up representative algorithm Open Pose and the top-down representative algorithm Alapha Pose.A series of experiments are performed to compare their performance.Finally,we uses Open Pose to extract the skeleton sequences of the action video,so that the videos can be classified with the skeleton sequence-based action recognition algorithm.
Keywords/Search Tags:Human Pose Estimation, Spatical Temporal Graph Convolutional Network, Attentional Branch, Co-occurrence Feature Learning Branch, Multi-task Framework, Two Stream Convolutional Network, Motion Filter, Random Cross-frame Fusion
PDF Full Text Request
Related items