Font Size: a A A

Research On The Analysis And Recognition Of Human Actions In Video

Posted on:2016-09-11Degree:DoctorType:Dissertation
Country:ChinaCandidate:J F YangFull Text:PDF
GTID:1108330473456077Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
The analysis and representation of human motion in video is an important branch of computer vision, whose main task is to detect, extract, and represent the human motion information. The research on human action recognition is multi-disciplinary, and has great theoretical and practical value. Because of the complexity and diversity of human motion, in despite of the research for a dozen years, video action recognition is still difficult to be applied to the reality. As the core of action recognition, there are a lot of problems exists in the analysis and representation of human motion.In the paper, our research concists of four components as follows:1. The motion of human body forms a three-dimensional volumn in space and time, whose shape information is an important clue in action representation, and this shape information can be captured by the spatio-temporal(ST) neighboring features. To accurately describe the local ST neighboring features, above all, we propose two novel algorithms: one is based on regular polyhedron ST neighborhood features, and another is based on multi-scale ST oriented neighborhood features. In the former algorithm, to accurately describe the relative position information among the neighboring features within a neighborhood, the axes of regular polyhedron are treated as a reference positioning system. In the latter one, the oriented ST neighborhood features are built by introducing ST scale parameters to the distance computation between features.2. Covariance feature is a powerful local feature, in the paper, we represent human motion information as covariance feature, and action recognition are studied in two cases. In the first case, the covariance features are mapped from Riemannian space to the Log-Euclidean space by matrix logarithm, then, followed by clustering and coding operations. In the second case, in order to preserve the geometry information of covariance features lying on Riemannian manifold, the clustering algorithm is directly implemented on covariance matrices, and obtain Riemannian matrix dictionary, then encode the covariance features with the proposed local Riemannian manifold coding algorithm. In addition, batch-average-update and sequential-average-update in the stage of clustering covariance matrices are further investigated under different distance measure on SPD matrices.3. Random forest based on Grassmann manifold is constructed and used in human action recognition. In traditional methods of producing local spatio-temporal feature descriptor, the 3D volume is divided into several sub-volumes by a spatial grid. The sub-histogram of each sub-volume is computed, next, all sub-histograms are concatenated to form a high-dimensional feature as the feature descriptor of the 3D volume. It is noted that the spatial relevance between the frames is undermined by the grid. To preserve the relevance information, we directly convert each frame to a column vector, as a result, the 3D volume is represented as a column matrix. The Grassmann distance is used to measure the similarity between these matrices. Finally, the sample probabilistic distribution on Grassmann manifold is learnt by the random forest based on Grassmann manifold.4. It is well-known that feature coding plays an important role in action recognition, and always has been a research focus. By studying on the coding algorithm Locality-constrained Linear Coding(LLC), we proposed a weighted LLC(WLLC) algorithm. LLC algorithm is an excellent sparse coding algorithm, and its advantages includes its sparse code, fast coding speed and small reconstruction error. One disadvantage of LLC is that data probabilistic distribution of cluster centers is completely abandoned in the stage of dictionary learning. As a result, each selected codeword equally contributes to the LLC code in the coding stage. The idea of WLLC lies in that, different codeword has different credibility, due to different sample probabilistic distribution around them, and the codeword with high credibility should make more contribution to LLC code in the stage of coding feature. The experiment results show that the WLLC method can effectively improve the system performance. Moreover, since the feature position is critical information in action representation, a novel mixed feature is proposed by combining the feature position and the feature descriptor. Then, the mixed feature is encoded by the proposed multi-scale spatial position coding algorithm, in order to effectively describe the motion information in space.At the end of the paper, we conclude the advantages and disadvantages of the proposed methods, then give a guide to the future work.
Keywords/Search Tags:digital image processing, action recognition, action analysis
PDF Full Text Request
Related items