Font Size: a A A

Research On Key Techniques Of 4D Human Action Recognition

Posted on:2019-05-21Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y ZhaoFull Text:PDF
GTID:1318330569987416Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
In recent years,due to the huge application prospect of human motion recognition,it has aroused widespread concern in the academic community.Scholars in related fields at home and abroad have put forward a large number of research methods and have achieved certain research results.In 2010,Microsoft released the Kinect depth sensor,which made it possible to use depth information widely in the field of human motion recognition.Depth data of the Kinect is obtained by reflecting off the surface of the object by the infrared ray pulse emitted,and thus data of 4D(x-y-z-t)space can be obtained.Motion feature extraction is the most important part of human body motion recognition.The feature representation ability directly determines the performance of the entire motion recognition system.Based on the current research results of human motion recognition,this paper conducts a series of studies on the key technology of feature extraction in 4D human motion recognition.The main work and results of this article are summarized as follows:(1)At present,there are many researches on the algorithms of spatio-temporal interest points and local feature descriptors based on color information,but there are relatively few studies on extracting interest points and feature descriptors on the depth channel.Although there have been some related studies,there are problems such as too sparse interest points and insufficient use of spatial descriptors by feature descriptors.In order to solve the above problems,this paper proposes an extraction algorithm of depth spatiotemporal interest points based on 4D-Hessian matrix.This algorithm not only can extract dense interest points in depth video,but also has spatial scale invariance.Aiming at the characteristics of depth data,a Local Depth Pattern(LDP)feature algorithm is proposed to make full use of the spatial relationship between pixels.In addition,in order to better fuse color features and depth features,based on the premise that the color images and depth images acquired by Kinect match each other,a method of sharing the coordinates of interest points is proposed,combining feature descriptors of color channels and depth channels,and analysis is performed.The effect of different combinations,and finally select the best combination to identify the action,for the 4D human motion recognition research provides a new idea.(2)At present,the local feature descriptors based on depth information often focus only on the spatial dimension and neglect the time dimension.For this problem,this paper first calibrates the depth data and proposes a simple and effective 3D sparse descriptor.The intra-frame and inter-frame pixel points are extracted to extract motion features.Feature geometric constraint is one of the most common constraints of human body motion recognition,which can greatly reduce the search space of motion and improve the performance of human body motion recognition.The traditional pyramid matching algorithm only constrains the target from two-dimensional space and time.For sparse video,this paper proposes a spatial-spatial-depth pyramid matching algorithm based on sparse coding.Sparse coding of 3D sparse descriptors and pyramid matching of depth videos in the space-time-depth dimension can effectively improve the accuracy of 4D human motion recognition,and it has good generalization ability and balance performance in different databases.Both have basically reached the state-of-the-art.(3)At present,most of the 4D human motion recognition methods based on deep learning use the depth map as a grayscale method to input into the deep learning framework without using information in the time domain.Therefore,this paper proposes that The depth field optical flow is an input of the convolutional neural network and is used for 4D human motion recognition.Moreover,the traditional deep learning network architecture often uses only one data model without combining color and depth.In this paper,the traditional dual-channel convolutional neural network is extended from the color channel to the depth channel.In addition to the use of color pictures and optical flow,depth pictures and depth field optical flow are also used to propose a multi-channel convolutional neural network combining color and depth information.Network method.By merging the information of different channels,the proposed algorithm achieves the best results in the UTKinect database,and compares the combination of different channels.It finds a relatively modest combination of recognition rate and network complexity,achieving performance and Computational compromise.(4)In the process of human motion in the real world,different camera angles and human poses result in very large changes in the visual characteristics,and the training samples are not uniform sampling of the posture parameters.The existing processing method generally uses multiple cameras to shoot from multiple angles when shooting a database.When collecting from a single camera,the photographer is often required to uniformly photograph the camera,but when these methods face a real-world scene Will fail.In order to solve this problem,this paper proposes a skeleton-distribution-characteristic feature algorithm based on viewpoint-independent,which can determine the orientation of human body by counting the distribution of human skeleton skeleton joints,and then normalize the coordinates according to the orientation of the human body to generate a new coordinate system.,and convert the three-dimensional coordinate system into a spherical coordinate system to obtain a skeleton map.The skeleton map was input as a 3D convolutional neural network,and the identification of 4D human motion was finally completed.The experimental results show that the proposed algorithm is robust to the viewpoint transformation problem.And without the optimization of the deep learning network,the proposed algorithm basically reaches state-of-the-art on the NTU RGB+D database.In summary,this paper studies the feature extraction of 4D human motion recognition based on color information,depth information and skeletal information,and proposes effective action feature extraction methods.It also aims at the extraction of interest points,multi-channel feature description and fusion,and geometric constraints.The exploration of issues irrelevant to the viewpoint has been explored,providing new ideas for the study of human motion recognition.The proposed algorithm is verified on some popular 4D human motion recognition databases.The experimental results show the effectiveness of the proposed algorithm and the research results can be extended to more extensive applications.
Keywords/Search Tags:4D human action recognition, Kinect, feature extraction, multi-feature fusion
PDF Full Text Request
Related items