Font Size: a A A

Research On Feature Extraction Model Of Human Action Based On Visual Cognition

Posted on:2011-11-18Degree:DoctorType:Dissertation
Country:ChinaCandidate:N LiFull Text:PDF
GTID:1118360305457807Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Human action recognition (HAR) is one of the most active research areas in computer vision due to its potential applications such as video surveillance, human-machine interaction and content based video retrieval, etc. The key to the success of HAR is the extraction of good kinetic feature. Good features usually exhibit reasonable trade-off between the "selectivity" of different action categories and the "invariance" to scale, displacement and orientation. However, recent researches don't go beyond the scope of computer vision which is actually the combination of classical signal processing and pattern recognition. Therefore, they can not achieve both "selectivity" and "invariance" completely at the same time.Primate owns the most reasonable infrastructure for visual pattern recognition. The winners of 1981 Nobel Prize in Physiology or Medicine Hubel and Wiesel hypothesize the synthetical principle of visual neuron's receptive field (RF). According to their theory, biological visual pathway consists of multiple layers of visual neurons, where the RF of neurons at higher level is integrated by the output of neurons at lower layer. Therefore, the higher level neurons reside in, the broader RF size and more complex motion feature the neurons tune to. In order to solve the problem discussed above, it can be a good attempt to apply the experimental evidence obtained in neurophysiology and cognitive psychology to the computer vision based HAR. This dissertation mainly focuses on building the neurobiological model for visual feature extraction, extracting salient spatio-temporal association of actions, and improving the model of the tuning property of simple visual neuron. A novel feature extraction model of human action is proposed based on the cooperation of the achievement made by these research aspects.For the representation of human action, the appearance based feature extraction methods don't show satisfing ability to achieve the balance between the "selectivity" and "invariance". To solve the problem, research efforts concentrate on modeling the feature extracting mechanism of primate visual neurons. Consistent with the synthetical principle, the synaptic connection between different cortex layers of primate visual gateway is studied. The author proposes a "feedforward hierarchical model" for visual feature extraction, where connections of the presynapse and postsynapse between layers are built up in the form of weighted connection. The model has a similar topology with traditional artificial network, where different layers implement two types of functions alternatively: 1. reflecting the best tuning property, namely the "selectivity", of visual neurons at different layers; 2. implementing two kinds of competitions which help get the maximal "invariance" of motion features. The cognitive ability of the model increases as it reaches higher layer, and the semantic representation on visual stimulation is finally established at the top of the model.In the spatio-temporal volume based HAR model, the volume analysis on moving object can't represent the salient moving region effectively, and thus extracts many motion-irrelative regions. To address this problem, the dissertation studies the spatio-temporal association of human action. By extracting the body part taking salient local movement, we can reduce the use of redundant information in the process of motion feature extraction. In the preprocessing stage on human action, the author proposes the algorithms to extract motion salient region (MSR) from two different aspects:distance computation and local motion energy computation. The first one analyzes the geometric feature of image by solving the discrete approximation of Poisson Equation; the second one makes use of the kinetic feature of moving object by finding the maximal value of the local energy of moving object along different orientation. Experimental evidence shows that the two algorithms can effectively extract the body parts that have salient motion property. The MSR is then used as the input of the feature extraction model introduced above, and plays a significant role to improve the effectiveness of the representation of actions.2D Gabor wavelet is usually used to simulate the tuning property of simple cells in primate visual cortex. However, the 2D Gabor function always includes the Direct Current (DC) component and lacks of the high frequency response. The dissertation studies the drawback of Gabor function and proposes 2D Log-Gabor wavelet. The Log-Gabor wavelet of multiple scales and orientations can represent motion feature at low level more compact. Based on experimental evidence, we get the best parameter configuration for the 2D Log-Gabor wavelet, which provides a good coverage on the spectrum to which primate's visual cortex tunes. The advantage of using the Log-Gabor wavelet and its parameter configuration is twofold:firstly, representing the motion property at low level more comprehensively; secondly, reducing the number of parameters required to be configured, which consequently enhances the computational efficiency of the feature extraction model.
Keywords/Search Tags:feature extraction of human action, action recognition, feedforward hierachical structure, visual cortex, synthesis of receptive field, spatio-temporal motion salient region
PDF Full Text Request
Related items