| Human behavior analysis is a critical research field in computer vision that finds wide application in areas such as human-computer interaction and virtual reality.Unlike RGB images,skeleton data has the advantage of obscuring complex backgrounds and adapting to dynamic environments,thereby effectively highlighting the subject’s behavioral performance.As a result,many researchers have focused on human motion recognition in conjunction with skeleton data.Graph convolutional neural networks have achieved significant success in modeling the topology of the skeleton.However,related work often represents only the natural connections between body parts,which can lead to redundant dependencies and error accumulation,ultimately affecting the accuracy of motion recognition.Additionally,the extraction of temporal features relies too heavily on the temporal convolution module,which uses a fixed convolution kernel to extract features and cannot adapt to feature variations over time,leading to inadequate local feature extraction.Furthermore,the temporal convolution module fails to establish long-term temporal dependencies when processing long sequences,resulting in the loss of duration-based dependencies.To address these shortcomings,we propose a novel algorithm that combines graph convolution unit with Transformer models.Our model design incorporates learnable channel-level relationship matrices and adjacency matrices to express joint associations,while deep separable convolution expands feature graph channels for higher-order spatial information representations.From the perspective of feature extraction,we have developed two different model architectures: a graph convolution-Transformer multi-stage spatial-temporal feature extraction model and an end-to-end graph relative Transformer model.Experimental validation demonstrates the effectiveness of our proposed models,achieving good results.To enhance the accuracy and robustness of human motion recognition,we aim to interpret the diverse kinematic dynamical information and fully exploit the potential features of motion.However,integrating different dynamic features presents several challenges.Various kinematic information has certain redundant or complementary relationships,and how to combine them and extract the most pertinent content is an important issue.Additionally,different dynamic information has varying importance and contribution,so effectively selecting the crucial motion intelligence and improving the efficiency and robustness of the features is critical.To address these issues,we propose a multi-stream graph Transformer model that employs a two-branch design,with each branch used to extract dynamic information of different importance.The main branch mainly extracts long-term dynamic change information,while the auxiliary branch focuses more on short-term dynamic changes.Unlike traditional multi-branch output feature fusion approaches,our auxiliary unit achieves information interaction with the main branch through a cross-attention mechanism.Experimental results show that our model has high performance and generalization capability when handling challenging tasks.However,capturing skeleton data poses requirements on operational costs and scene setup.One of the methods to address the challenge of obtaining skeleton data is to generate new human motion data based on existing motion data.We propose a technical route for the motion generation task,aiming to solve the critical problems in character motion generation:automated generation of skeleton sequences from RGB image sequences and improved accuracy and applicability of skeleton sequences.Firstly,we use the CNN+Transformer architecture to generate skeleton data sequences from RGB image sequences,thus solving the problem of complex poses and movements that are difficult to handle in traditional methods.Secondly,we propose a spatial-temporal Transformer-based skeleton data refinement model,which adaptively learns the spatial-temporal relationships in motion data to improve the accuracy and applicability of motion data,thus enabling the skeleton sequences generated by our first method to better serve follow-up applications.We validate the performance of our designed character motion generation model through small-scale experiments,providing guidance and rationale for subsequent model optimization and improvement. |