Research On Video-based Human Pose Estimation Technology Using Temporal Consistency

Posted on:2023-09-24

Degree:Master

Type:Thesis

Country:China

Candidate:R Y Feng

Full Text:PDF

GTID:2568306791467844

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Human pose estimation is a fundamental problem in computer vision community,which forms the cornerstone of a series of downstream tasks such as action recognition,human parsing,and pose tracking.It aims to identify and detect the joints positions of all people from the input data.Video-based human pose estimation supports enormous applications including security,violence detection,human-machine interaction,and augmented reality.However,frequent pose occlusion and motion blur in videos,as well as the time-consuming and labor-intensive manually annotation,dramatically increase the complexity of this task.Currently,many works focus on human pose estimation in static images.These approches inherently have difficulties in leveraging temporal context across video frames,and rely heavily on the visual feature of the current frame.Consequently,they usually fail in the scenes of pose occlusion and motion blur,which leads to inaccurate keypoint detection.On the other hand,existing models generally train models using the densely labeled dataset,yet neglect the collection and annotation process of video dataset.This paper emphasizes temporal consistency on video-based human pose estimation,focusing on the following two methods:(1)A deep dual consecutive network for huamn pose estimation.We employ consecutive video frames from dual temporal directions as supporting frames,and extract temporal information to improve pose estimation of the current frame.In particular,we design three components to implement the network.A Pose Temporal Merger encodes keypoint spatiotemporal context to generate effective searching scopes.A Pose Residual Fusion module computes motion cues in the short term.Finally,a Pose Correction Network comprising multi-granularity deformable convolutions is proposed for resampling keypoint heatmaps in the localized search scopes.(2)A multi-stream inference network for human pose estimation.Regarding the problem that existing approaches rely deeply on the visual cues of the current frame,we design a novel multi-stream inference network.The network incorporates bi-directional pose forecasts that are independent of the current frame visual features to achieve a superior complement to the visual detection results.Furthermore,considering the difficulty and high cost of labeling video datasets,we extend the network and apply it to sparselylabeled video scenes(pose annotations are given every frames).The extended network can accurately predict the pose sequences of the entire video by using a few annotated frames during the test phase,and hence simplifies the annotation process.Experimental results demonstrate that using temporal information can effectively improve the accuracy of keypoint detection in videos,significantly outperforming existing state-of-the-art pose estimation methods on multiple benchmark datastes.Additionally,when applying the proposed method to sparsely-labeled video scenes,we still achieve remarkable results at large temporal intervals.

Keywords/Search Tags:

human pose estimation, video-based pose estimation, convolutional neural network, deep learning

PDF Full Text Request

Related items

1	Research Of Human Pose Estimation Method Based On Convolutional Neural Network
2	Human Pose Estimation Based On Convolutional Neural Network
3	Research And Application Of Human Pose Estimation Based On Deep Learning
4	Human Pose Estimation For Resource-Limited Scenes
5	A Research Of Human Pose Estimation Based On Deep Convolutional Neural Network
6	Single-stage Human Pose Estimation Based On Deep Learning
7	Research On Human Pose Estimation Technology Based On Deep Learning
8	Researches On Human Pose Estimation Based On Deep Learning
9	Design And Implementation Of Procession Scoring System Based On Human Pose Estimation
10	Deep Learning Based Real-Time Pose Estimation And Human Animation Generation