Font Size: a A A

Research On Skeleton Based Human Action Recognition

Posted on:2018-11-02Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z Z WuFull Text:PDF
GTID:1318330518997799Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Human action recognition is a relatively active research field in computer vision,with the goal of automatically identifying human behavior in a given scene. There are many potential applications, including content-based video search, human-computer interaction, video surveillance, and sports video. A human body can be considered as a joint system of rigid joints (limbs) connected by joints, which are composed of the movements of these rigid segments and are represented by the movement of the human skeletal joints in 3D space. With the development of deep camera such as Kinect and skeleton extraction technology, the study of human behavior recognition based on skel-eton is gradually emerging.The studies of action recognition based on skeletons mainly includes three-dimen-sional skeleton feature representation, dynamic temporal registration, multi-source data based feature jointly learning, and key frame analysis. In the process of frame-level feature extraction, the category and temporal characteristics of the subordinate se-quences are often neglected, so that it is difficult to capture the subtle and meaningful differences between the actions. For the dynamic time series registration, the traditional registration method based on dynamic timing warping has serious mismatching prob-lems when dealing with the sequence containing periodic fragments, and this method handles intra-class and inter-class sequences uniformly, so that the generated phantom template identification ability is weak. For some complex or very similar actions, if only rely on skeleton data, the algorithms are often difficult to make accurate identifiļ¼cation. In view of this problem, there are related research on the recognition of behav-ioral action by combining other sources, such as depth map sequence and RGB image sequence. But these work is often only a linear combination of multi-source features,the improvement of final action recognition performance is very limited, and sometimes even this linear combination will have a negative effect. In addition, most of these re-search on key frame extraction is based on some pre-designed and weak semantic fea-tures of the criteria, which makes it difficult for them to adapt to different scenes or tasks.Based on the aforementioned background, this paper first gives a brief overview of the research background and significance of human action recognition algorithm and the status quo of research at home and abroad, and expounds the basic idea and research ideas of this paper. Around the three-dimensional skeleton feature representation,dy-namic timing registration, multi-source data feature jointly learning and temporal visual attention selection,we are to carry out in-depth study.In short, this dissertation makes the following contributions:(1) According to the current research situation that is rarely consider the temporal and category information when extracting the frame-level features. In this dissertation,we propose a new algorithm, that is denoise Auto-encoder with constraints of temporal and category (DAE-CTC). By adding the temporal and category constraint terms in the unsupervised feature learning process, the learned feature will be more robust. Compare wtih the other approaches, DAE-CTC gets a higher accuracy in action recognition ex-periments.(2) Aiming at the problem that there is a strict timing preamble restriction in the existing time registration algorithm. And it may be completely ineffective when dealing with periodic action. In this dissertation, we propose a new time registration method based on local constraints, called LRWS. For the calculation of the action implicit tem-plate, LRWS not only considers the difference between the inter-class sequence and maintains a stable and smaller range, but also strengthens the difference from the inter-class sequence. Compared with other contrast algorithms, the proposed method has a better performance.(3) In order to solve the problem of inaccurate identification of existing algorithms with very similar or overlapping algorithms, a multi-source feature jointly learning al-gorithm combining depth map is proposed, namely the deep multi-model Auto-encoder,DMAE. DMAE uses the DAE-CTC and CAE to extract the hidden layer features of the skeleton and the depth map respectively. The two-layer neural network is used to model the feature representation nonlinearly. Finally, the BP-NN is used to optimize the whole network. The final extracted skeleton node features and depth map features exhibit bet-ter recognition capabilities, especially for complex and very similar actions. In addition,DMAE has a strong ability to rebuild skeleton with the interferences.(4) Aiming at the problem of timing interference and computational redundancy in human behavior recognition, we propose a new recursive temporal sparse Auto-en-coder, namely TSAE, for temporal visual attention selection. This method makes full use of the characteristics of LSTM output gate, adding low rank and sparse constraints,can reduce the computation amount adaptively, improving the accuracy of action recog-nition. Furthermore, TSAE has significant advantages over other comparison methods in term of the time performance.The studies in this dissertation offered feasible solutions for some critical issues in skeleton based action recognition. We proposed several new methods, such as three-dimensional skeleton feature representation learning algorithm, dynamic time registra-tion algorithm, multi-source feature joint optimal learning algorithm, and temporal vis-ual attention selection mechanism. The validity of the proposed algorithm in the actual data set is verified by a large number of experiments. It has expanded the idea of the further application and development of human action recognition research.In this dissertation, a number of important problems in human action recognition based on skeleton are proposed, and their corresponding effective algorithm solutions are proposed: three-dimensional skeleton feature representation learning algorithm, dy-namic time registration algorithm, multi-source feature joint optimal learning algorithm,and temporal visual attention selection mechanism. At the same time, a lot of experi-ments are carried out on the real data sets, which verifies the validity of the algorithm and extends the thinking for the further application and development of human action recognition research.
Keywords/Search Tags:Action Recognition, Skeleton, Auto-encoder, Temporal Registration, Non-linear Mapping, Neural Network
PDF Full Text Request
Related items