Font Size: a A A

Human Action Recognition By Fusing Video Information And Skeletal Point Data Based On Kinect Platform

Posted on:2018-04-05Degree:MasterType:Thesis
Country:ChinaCandidate:Z Q LiuFull Text:PDF
GTID:2348330512481829Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Human action recognition is a hotspot in the field of computer vision.It has been widely used in intelligent robot,video surveillance and other intelligent systems.Because of the complexity of human motion and the complexity of the scene,there are still many difficulties and challenges.Relying on the traditional single modal data,the expression ability of extracting features for the action is limited,which restrains the improvement for the accuracy of action recognition,especially for the similar action as well as the case of lack of training samples.The human action recognition method based on fusion of the video information and skeletal data captured on the Kinect platform is studied in this paper.The main work includes the following aspects:In terms of data preparation,the synchronous acquisition of motion related video and skeletal data is completed after having studied the hardware composition and the principle of data analysis of Kinect.At the same time,corresponding extension work has been done in this paper,the synchronous data acquisition system of two Kinect modules is developed,which can provide the data acquisition platform of different angles for the following research.In the aspect of the feature representation for human action,two kinds of action characteristics are studied respectively from skeletal data and video data of the Kinect.For the action feature based on skeletal data,we obtain the angle between vectors which including the velocity vector related to the motion of joint point and the human structure vector.It can be used to express the potential dynamic information and improve the discriminative ability of the human posture.For the action feature based on video data,first,a region of interest is located by the skeletal data and the SURF feature descriptor is extracted,and then the bag of words model is used to express the feature related to the human motion.It can be used to supplement the effective criterion information for the case of the deformation of human skeleton model caused by occlusion,and it is helpful to improve the accuracy of action recognition.The experimental results show the effectiveness of the representation.In the aspect of action recognition method,firstly,based on the single data model,especially on the skeletal data,different recognition methods are studied for different action recognition tasks.Meanwhile,the effectiveness of the skeleton data feature has been verified.And then,the method that how to fuse the effective video information is studied for the case of human skeleton model deformation caused by occlusion.For the recognition method based on skeletal data,the method of matching intra class multi template combining with the maximum separability of PCA was studied for the case of great within-class differences and unobvious between-class differences.Considering the intra class difference,several sub classes are first extracted from each action category to construct a training template set.Then test sample and training template are projected into the largest separability space for matching and voting to recognition the corresponding action.The method based on bag of words model combined with GMM soft assignment was proposed for the similar action recognition.A pose dictionary is first constructed and then the GMM is introduced.The similar pose will be soft assigned to different key poses according to their poster probability.Considering the problem of class balance,the algorithm is optimized by adjusting the weights of different categories.For the situation that lack of training samples,based on the idea of small sample learning of one-shot learning,several key poses are extracted from one set of training sample by k-nearest-neighbors method.After filtering,some key poses with strong discrimination ability are retained and weighted voting to different categories of action according to their contribution rate.At the same time,the real-time online testing is conducted and good performance is obtained.For the fuse of video information,on the basis of the above methods,our fusion mechanism between feature layer and decision layer is proposed.Two modal data are first scored respectively for different action categories and then aggregated.Action recognition is completed in the decision layer.The experimental results show that the proposed fusion method is feasible and flexible.
Keywords/Search Tags:Action Recognition, Bag of words, Fuse, Kinect
PDF Full Text Request
Related items