| Artificial intelligence technology has continued to flourish for decades.The hardware technologies such as graphics processing unit(GPU)are also constantly improving,which promotes the development of deep learning.Recently,the concept of "Metaverse" has also been popular.One of the main focuses of attention is human-computer interaction.It is necessary to focus on human behavior recognition technology.At the same time,human behavior recognition technology is widely used in smart cities,smart medical care,video retrieval and other fields.Human behavior recognition technology based on video will be affected by the video image illumination,camera angle and color tone,which will inevitably affect the accuracy of the recognition result,and the algorithm is relatively large and don’t have a good real-time performance.Therefore,selecting the human skeleton sequence for analysis can avoid problems such as illumination and camera angle.In the current research work in the field of human skeleton sequence action recognition,it is difficult to take into account the accuracy and realtime performance of the algorithm at the same time.Algorithms with high accuracy have large volume and poor real-time performance,while for lightweight and efficient algorithm models,there is still a large room for improvement in the recognition accuracy.This thesis takes human skeleton action recognition based on deep learning as the research topic,selects the human skeleton sequence as the data set.To begin with the preprocessing of the original data,and then pass through a series of convolutional neural networks,and introduces computer vision self-attention for the characteristics of the data,to improve the accuracy of behavior recognition while ensuring the lightweight and realtime performance of the algor ithm.The main research work is as follows:1.Propose the combination of the dual-modal Attention mechanism and the graph convolutional network model,adjust the proportions of different joint points in the spatial dimension,and learn the proportions of different frames in the time dimension,and combine the graph convolutional network and convolutional neural network to recognize the action of the skeleton sequence and improve the accuracy of the model.2.For the original skeleton sequence data,using graph convolution network to extract the data features of graph structure can extract the data features of graph structure better.Using adaptive adjacency matrix and adding the idea of residual learning unit and dense connection,the model can extract the hidden features in the data better.3.Using a dual-stream network architecture with a dual frame rate,one of dual-stream samples data at a lower frame rate and pays more attention to spatial domain features,and the other stream samples data at a higher frame rate,paying more attention to temporal domain features,and features are fused with horizontal connection.For the rest network,the graph convolution network is still selected for feature extraction to improve the accuracy of the algorithm model. |