Font Size: a A A

Research For Action Recognition Based On Spatial-Temporal Stream Convolution Neural Networks

Posted on:2018-08-08Degree:MasterType:Thesis
Country:ChinaCandidate:D W ZhaoFull Text:PDF
GTID:2428330596453019Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Human pose estimation and behavior recognition in video are widely used in intelligent monitoring,medical diagnosis,human-computer interaction and motion analysis,which makes it become a research topic in computer vision field.However,due to the high degree of freedom of human posture and the complexity of the behavioral data set,making this work is facing great difficulties,especially in the action of the subtle differences in performance is more obvious.The emergence of convolution neural network brings convenience to the feature extraction phase of image recognition,which avoids the complexity of manual design and becomes the hotspot of various fields.In this paper,based on the self-learning characteristics of convolutional neural network,the classification of subtle movements for complex data sets is improved by thinning the input of the network:(1)In the posture estimation stage of behavior recognition,this paper uses NBest algorithm to generate N posture candidate sets for each video frame.Then,the candidate sets are decomposed by limbs to obtain larger candidate data sets based on limbs,and finally the posture reconstruction is performed by limb reorganization from top to bottom.In the case of positioning the wrist and elbow position,this paper improves the positioning accuracy by introducing the next frame virtual link of the part.Experiments show that posture reconstruction based on limb decomposition has better evaluation performance,especially the positioning of wrist and elbow has been improved obviously.(2)In the feature extraction stage of behavior recognition,the characteristics of the self-learning feature of deep convolution neural network are used,and the training is based on the ILSVRC-2012 pre-training model.In order to fully extract the static information and motion information of the action,the network structure based on time and space flow parallel learning is proposed,and then the action classification is completed by feature fusion.Experiments show that the feature fusion based on the spatial-temporal network can improve the recognition performance to a great extent.(3)Aiming at the problem of low efficiency of subtle operation on complex data sets,a multi-position segmentation strategy of attitude is proposed.Since most of the selected data sets are visible only to the upper body of the human body,this article only considers the segmentation of the arm and upper body.In order to extract the optical flow information between successive frames of the segment,the size of the separated image is normalized,and then the fixed size RGB image and the optical flow image are input as the space-time convolution neural network respectively.Finally,the extracted features are carried out Fusion and classification in SVM.The experimental results show that the multi-part segmentation scheme has better classification effect than traditional convolution neural network in JHMDB and MPII Cooking.
Keywords/Search Tags:spatio-temporal stream convolution neural network, behavior recognition, limb decomposition, posture segmentation
PDF Full Text Request
Related items