Font Size: a A A

Research On Human Action Recognition Based On Multimodal Information Fusion

Posted on:2021-04-18Degree:MasterType:Thesis
Country:ChinaCandidate:Q S ZhaoFull Text:PDF
GTID:2428330623465053Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The analysis and understanding of human actions and behaviors is one of the main research elements of modern psychology.With the development of artificial intelligence and the improvement of human computing power,human motion recognition has gradually become one of the research hotspots in the field of computer vision and image processing,and this direction not only has important theoretical research value,but also has a wide range of application prospects.The main reason for this is the potential application value of human action recognition in human-computer interaction,health monitoring,intelligent security,video analysis and other research fields.This paper relies on the project of the National Natural Science Foundation of China(NSFC),"Research on the Method of Human Action Recognition Based on RGBD Image Sequence and Acceleration Signal Fusion",to conduct research on the human action recognition problem of multimodal information fusion requirements in complex reality scenarios.In this thesis,the topic of human action recognition based on the fusion of monomodal data heterogeneous feature information and human action recognition based on the fusion of multimodal data heterogeneous feature information is developed from two aspects.First,for action recognition under single modal data,the main work and contributions of this paper are as follows.(1)A human action recognition method based on time-frequency domain feature fusion of acceleration data.In this paper,an identification method based on time-frequency domain feature fusion of acceleration data is proposed to extract the frequency-domain feature of acceleration data,i.e.,short-time Fourier transform(FFT).Experimental analysis revealed a high degree of discrimination between small local body movements and large limb movements,but it is more difficult to distinguish between movements that are sensitive to movement frequency.The time-frequency domain feature of the acceleration data,called wavelet decomposition(WPD),is then extracted.The experimental analysis revealed a higher degree of discrimination for frequencysensitive types of movements.Finally,the two feature representations are fused at the decision level.The method overcomes the inadequacy of the time-frequency domain feature representation to discriminate against specific types of actions,improves the discriminatory power of encoding acceleration data features,and achieves better identification results,respectively.(2)A human action recognition method based on spatio-temporal feature fusion of skeleton data.In this paper,an identification method based on the spatio-temporal feature fusion of the skeleton data is proposed to encode the spatio-temporal cues of the skeleton data,respectively.Based on the skeleton data,a geometric feature containing rich temporal cues with perspective invariance(PoJM3D)is extracted,which represents a downscaled projection of three-dimensional human skeleton information into a onedimensional angular space.A physical feature(MoP)containing rich spatial action information is then extracted based on the momentum of the skeleton node.Experimental analysis shows that PoJM3 D can effectively compensate for the lack of spatially apparent information.The method extracts strongly discriminating feature representations from the temporal and spatial dimensions of the skeleton data,respectively,which improves the discriminating power of feature encoding of the skeleton data and achieves an effective discrimination effect.Secondly,for action recognition under multimodal data,the main work and contributions of this thesis are as follows.A human action recognition method based on multi-modal feature information fusion of correlation analysis.In this paper,an action recognition method based on heterogeneous feature fusion of acceleration data and visual data is proposed.Among the visual data are human skeleton data and depth map sequences.Because the acceleration data contain rich temporal cues,short-time Fourier transforms of the time domain features of the acceleration are extracted.Visual data are generally rich in three-dimensional spatial representations,so they are represented by features such as time-space cube pyramids(STCP).Two heterogeneous features from different modalities are then fused using the correlation analysis of the matrix.The method demonstrates that heterogeneous features from different modalities can be efficiently fused to improve model recognition performance.Finally,this paper analyzes the limitations of existing datasets in the field of action recognition to develop and implement a multimodal human action recognition acquisition system.Although many datasets are dedicated to human action recognition,current action recognition methods are still severely limited in their variability and complexity of recognizable actions.To meet the needs of our research question,we have collected two large multimodal action recognition datasets based on this system platform.With its realistic scenes and complex collection strategies,the Free-PASS dataset exposes the true difficulty of action recognition in real scenes.
Keywords/Search Tags:multi-modal machine learning, human action recognition, feature representation, multi-modal information fusion
PDF Full Text Request
Related items