Font Size: a A A

Research On Key Technologies Of Video-based Human Action Recognition

Posted on:2018-11-14Degree:DoctorType:Dissertation
Country:ChinaCandidate:H F ChenFull Text:PDF
GTID:1318330512485997Subject:Security emergency information technology
Abstract/Summary:PDF Full Text Request
The technologies of video-based human action recognition can meet the tasks of automatic intelligent analysis,such as network video retrieval and analysis,intelligent video surveillance analysis,intelligent video monitoring etc.It has aroused widespread concern in academic and a large number of scholars are focusing on the research of human action recognition.Although the research on human action recognition has made great progress,analyzing actions in videos is still a very challenging task due to the variation of action speed,view point and background.Human action feature extraction is the most important step in the process and seriously affect the accuracy of action recognition.So in this paper,the algorithms about deep-learned features of human action have been studied under the support of project funds.The main contributes are as follows.(1)Frame sampling based on decomposition of action primitives.The existing sampling methods of video frames in deep-learned features are equalization sampling and sequential sampling,which ignore the changes of duration of action primitives(or state)in human action and cannot robust to the temporal scale of action primitives.To address this problem,the relationship between human action and similarity of video frames has been analyzed,and a video frame sampling algorithm based on decomposition of action primitives which using the Hamming distance between adjacent frames has been proposed.The experimental results show that the recognition rate of proposed method is at least 3.5%higher than equalization sampling and sequential sampling on the HMDB51 dataset.(2)Image sampling based on motion saliency.At present,the image sampling methods of deep-learned features are image scaling sampling,image center sampling and center quadrilateral sampling which cannot sampling the image around the human behavior area.To overcome this problem,the newest motion saliency detection algorithm has been improved and applied to the image sampling.The algorithm can sampling the image blocks correctly according to the action motion area which needed by the convolution network.Thus the proposed method effectively capture the information of human behavior change and extract the human action characteristic with excellent identification ability.Experiments show that the proposed method is more efficient than the traditional image sampling methods,and the gains on action recognition performance are more than 2.7%.(3)Human action recognition based on multimodal features.There is no comparative study on RGB image,optical flow and other modal data for deep-learned action features.There is also no research on deep-learned features using the newest modal data:motion boundary and gradient boundary.In this paper,we introduces motion boundary and gradient boundary for deep-learned feature extraction of human action,and compare all modal features and their fusion on the performance of human action recognition.Experiments show that the deep-learned motion boundary and the gradient boundary features have strong action characterization ability,and the action recognition of multiple modal features fusion at temporal feature layer is better than fusion at convolution feature layer.(4)Human action recognition based on real-time global motion compensation.The optical flow is the best modal data for human action,but it needs time consuming calculation on dense optical flow.Zhang et al.have proposed EMV-CNN algorithm[84]for obtaining deep-learned features in real time by using motion vector instead of optical flow.But the EMV-CNN algorithm did not eliminate the interference of the global motion information to the human behavior information.A real-time global motion estimation and compensation method based on the symmetry and difference theory of global motion vector has been proposed in this paper.Experiments show that the EMV-CNN feature based on global motion compensation can significantly improve the recognition rate of behavior under the premise of ensuring real-time performance.In summary,the problems of robustness of behavioral frame sampling and image sampling,the complementarity of multimodal features and the real-time characteristics of depth features have been studied.A video frame sampling algorithm based on motion decomposition has been proposed.And the performance of the motion characteristics has been analyzed.The behavioral characterization ability of multiple modal features and their fusion features has been compared.The real-time global motion estimation and compensation method has been proposed according to the symmetry and difference theory of global motion vector.The research achievements significantly improve the performance of human action recognition.
Keywords/Search Tags:Human Action Recognition, Convolutional Neural Network, Motion Saliency, Multi-Modal Feature, Global Motion Estimation
PDF Full Text Request
Related items