Font Size: a A A

A Research Of Pose Estimation And Action Recognition In Action Digitization

Posted on:2021-02-26Degree:MasterType:Thesis
Country:ChinaCandidate:Y DengFull Text:PDF
GTID:2428330623467788Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Scientists have been exploring how to enable machines the ability to understand human society.With the development of artificial intelligence technologies,this idea has gradually become a reality.For example,in the field of computer vision,object detection and recognition can help machines understand the surrounding environment,human pose estimation and action recognition can help machines understand human society,and natural language processing can help machines interact with humans.This paper focuses on the two subtasks mentioned above: human pose estimation and action recognition.Currently,Multi-person pose estimation usually includes two types of methods:top-down and bottom-up.The top-down method is to employ a person detector to locate the position of the human body,and then perform single human body pose estimation to detect each person.However,in crowded scenes,the estimation of keypoints in the human box will be interfered by body parts of other individuals.With the development of human pose estimation,the method based on the human pose estimation in action recognition becomes more and more popular,because human action is essentially composed of joint points and the approach is not disturbed by the picture quality.However,there is a disadvantage of using human pose for action recognition: an end-to-end network cannot be established to learn two tasks simultaneously.In order to study and solve these two problems,the work done in this paper is summarized as follows:1.This paper proposes an efficient approach for multi-person pose estimation in crowded scenario: fuse top-down and bottom up.Firstly,body detector is used to locate the human body,and then the single-person pose estimation network is used to predict all possible human body keypoints,including other human body keypoints,finally these keypoints are screened out and combined individual keypoints through aggregation algorithm proposed in this paper.2.The aggregation algorithm proposed in this paper has a tree structure.First,the confidence between adjacent joint points is calculated.,and then finds the greatest confidence maximum path from the root node to the leaf node.The joint points on this path is the human keypoints that need to be detected in the current bounding box.theaggregation method is not a NP-hard problem,because it only needs to be combined into individual keypoints.3.In the field of action recognition based on human pose estimation,usually human keypoint coordinates are used for action recognition,which results online learning of two tasks cannot be performed simultaneously.Therefore,this paper explores human keypoints heatmaps for action recognition to solve the two problems in an efficient way with a single architecture.4.This paper improves a spatio-temporal attention mechanism to optimize the utility of temporal information and joints information.In action recognition,different joint points play different roles in different actions,and different time sequence information has different influence on human action recognition.Therefore,this paper proposes a spatiotemporal attention mechanism to distribute different weights over the characteristics of human posture heatmap,making the network pay attention to more important information.In addition,this paper combines the image features and human posture features for action recognition.
Keywords/Search Tags:human pose estimation, aggregation algorithm, action recognition, attention mechanism, action fusion
PDF Full Text Request
Related items