Font Size: a A A

Study On Human Pose Estimaton,Tracking And Human Action Recognition In Videos

Posted on:2018-09-05Degree:DoctorType:Dissertation
Country:ChinaCandidate:M MaFull Text:PDF
GTID:1318330518983826Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
As robots become more and more involved in human daily production and life,the problem of human-machine interaction becomes increasingly concerned. The ability for robots to perceive the environments is one of the core problems in human-machine interaction. Robots like human beings can perceive ambient environment through observing, smelling, hearing, and touching. In current robot applications, the most common used sensors are vision, audition, and tactus, among which vision plays the most important role for obtaining 80% ambient information. In this case, robot vision research attracts more and more attention all over the world.Robots used for human-machine interaction are usually mounted with colorful camera to perceive and observe ambient environment. In this thesis, we propose methods to extract human pose and action information from videos for robots to recognize and understand human actions and activities. These works will help establish feature databases for human actions rapidly, and provide basic data and model for supporting humanoid robot action planning and human-computer interaction. In this thesis, an offline human pose estimation method and an online human pose tracking method are presented, respectively. Furthermore, utilizing the obtained human pose estimation results, a human physical action recognition method and a fine-grained action recognition method are proposed. The main contents and innovation points of the thesis are as follows:1. A local-global layered human pose model is constructed, which is allowed to be decomposed and recomposed. This model includes a global layer and a local layer.The global layer is used to describe the entire human upper body, while the local layer is used to imitate poses of each body part. The highlight of the model is that, for optimizing human poses, the pose of each body part is separately optimized and then the optimized part poses are used to achieve optimal human entire pose in global layer.On the other hand, entire body pose is in turn able to be trimmed in global layer. In this case, this model make human pose optimization problem more targeted.2. A dynamic multi-layered algorithm is proposed for human pose estimation.The algorithm consists of five layers and the parameters and data in each layer change dynamically during the operation process. This algorithm evaluates the consistency of human pose through the video sequence by constructing dummy pose in adjacent frames. Besides,Particle Swarm Optimization is effectively used to optimize human pose based on limited number of human pose candidates, which reduces the computation time.3. An online human pose estimation and tracking algorithm is proposed. In order to get efficient target information,this algorithm starts pose tracking by proposing an initialization method to obtain human pose in the first frame of video. Besides, this algorithm utilizes appearance information and motion information to track human pose in videos, and adjusts as well as corrects poses while tracking. What's more, in order to guarantee pose consistency in video, an adaptive penalty cost function is constructed using video motion and appearance information.4. A multi-level image sequences and sub-video segmentation technology based human physical action recognition method is proposed. Human pose estimation results are used to extract human region patches, in which case four image patch sequences are obtained. Besides, binary tree is used to segment video sub-sequence from coarse to fine. Finally, Convolutional Neutral Networks are used to process image patches and achieve the final human action descriptors for videos.5. A human fine-grained action recognition method is proposed. This method makes efforts to seek more useful fine-grained action information. Human operation areas are extracted to insert more effective pixels. All patch sequences are handled with CNN structures, and encoding method is used to encode CNN pooling layer outputs to obtain distinguishable fine-grained action descriptors for videos.
Keywords/Search Tags:human pose model, pose estimation, action recognition, video understanding
PDF Full Text Request
Related items