Font Size: a A A

Research On Key Technologies Of Video-based Human Action Recognition

Posted on:2020-08-20Degree:DoctorType:Dissertation
Country:ChinaCandidate:H Y LiFull Text:PDF
GTID:1368330590454116Subject:Security emergency information technology
Abstract/Summary:PDF Full Text Request
The task of human action recognition in video is to use computer to automatically process and analyze the input video clips,and to distinguish the categories of human action according to the information of the posture,movement and scene of the characters in the video.Video action recognition technology can be used in intelligent monitoring system,video marking and retrieval,intelligent nursing,human-computer interaction and other fields,and has a broad application prospect.The scientific and technological value of video action recognition research lies in discovering the factors affecting the human action of machine recognition and their interrelations,finding the representation model suitable for the expression of human action in video,and exploring the algorithm for solving the optimal solution of the model.This research direction has increasingly become a research hotspot in the field of computer vision.Affected by camera parameters,complex scenes,individual differences and other factors,video action recognition still faces many challenges.The representation based on depth features and hand-crafted features are the most commonly used representation of action recognition at present.There are some shortcomings in the two methods: due to the difference of action rhythm and the randomness of person's position in video frame,the method of equal interval frame sampling and random image patch sampling can not focus the region of interest,and that leads to many invalid samples.In the process of coding for local features,the existing unsupervised dictionary learning does not make full use of the video label information,and the discriminant ability of dictionary is weak.Optical flow and image information are the main sources of information in action recognition,the existing methods of multi-feature mosaic or linear overlay do not make full use of the spatial synchronization information of features,and the problem of insufficient ability of feature representation exists.In order to solve these problems,this paper focuses on video sampling,feature coding and multi-feature fusion.The main research contents and achievements are as follows:(1)Video Sampling Based on Attention Mechanism and Reinforcement LearningWhen processing video in deep convolution network,the continuous frames are usually sampled at fixed intervals,and then the image patches in the frames are sampled randomly.This sampling method can not ensure that the video frames and image patches closely related to the action are sampled.To solve this problem,we propose an attention model to guide the sampling of key frames and key image patches.Based on the observed video information,the attention model estimates the position of the next key frame relative to the current frame and the relative position of the next image block concerned by autonomous learning of neural networks.Finally,the model is solved by using reinforcement learning algorithm.Experiments show that the attention model proposed in this paper can better guide video sampling and improve the performance of action recognition,which is superior to the traditional sampling method.(2)Feature Coding Based on Multi-instance Learning and Discriminant DictionaryThe action recognition method based on local feature representation requires coding and pooling local features to get the global representation of video.Currently,unsupervised learning is used in dictionary learning of local features.The construction of dictionary is blind and lacks discriminant ability.A discriminant dictionary learning and coding algorithm for local features is proposed.This algorithm assumes that similar features exist in each local feature set of the same video category,but these similar features do not exist in other local feature sets.Based on this multi-instance hypothesis,we use the learnt classifier as the codeword of the dictionary.In order to further improve the quality of the dictionary,cross-validation strategy is also introduced in the discriminant dictionary learning algorithm based on multi-examples.A strategy to slightly limit the number of positive examples in each collection.Experiments show that the performance of action recognition is better than that of other traditional algorithms,and can be integrated with other algorithms to achieve complementarity.(3)Feature Fusion Based on Convolutional Characteristic Map and Gateway MechanismOptical flow features express the motion information and the appearance information of the characters.The combination of the two features is the most commonly used method in action recognition.Optical flow mainly concentrates on the moving parts of human body,and the appearance information of these areas is the key to action recognition.The existing fusion methods either neglect the spatial synchronization information between optical flow and image,or simply superpose them linearly,and do not make full use of optical flow information to guide the extraction of image features.With the expansion of perceptual regions,high-level convolutional features will ignore the details of low-level information,and the details of information is very important for the classification of small visual differences.In order to solve this problem,a gate-pass model based on optical flow characteristics is proposed in this paper.The control gate is generated by using optical flow convolution characteristics,and the image information is filtered.The detailed features closely related to motion are obtained.These features are fused with the initial optical flow characteristics and image features through a circular network.Experiments show that the performance of the system can be effectively improved by fusing the features acquired by gate-pass mechanism with the original optical flow and image features.In this paper,we use the mechanism of human visual perception and cognition for reference to solve the classification problem of human action in video.We have made exploratory research on the basic theory and key technologies in the field of video analysis and understanding,and have achieved some innovative results.
Keywords/Search Tags:Action Recognition, Dictionary Learning, Convolutional Features Fusion, Attention Model, Reinforcement Learning
PDF Full Text Request
Related items