Font Size: a A A

Research On Action Recognition Based On Skeleton And Attention Mechanism

Posted on:2022-01-28Degree:MasterType:Thesis
Country:ChinaCandidate:S LiFull Text:PDF
GTID:2558307154479374Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the development of information technology in today’s society,the application of action recognition technology is becoming more and more extensive,and it is also closely related to our lives,such as video surveillance,medical diagnosis,virtual games and other fields.The purpose of action recognition is to recognize the movements of the human body from the video.Therefore,RGB video sequences have become the most intuitive and commonly used data format.However,with the continuous development and improvement of hardware devices in recent years,more data formats have emerged.Such as,depth maps,human skeleton maps,infrared sequences,etc.,each data format has its own advantages and disadvantages.Whether from RGB video or other forms of data to recognize human actions,the most important thing is the extraction of features.The effectiveness of the extracted features has a great impact on the recognition efficiency of the model.How to extract more effective and discriminative features is the focus of current research.In response to this problem,this article has mainly done the following two tasks:First of all,this paper proposes a deep learning method that uses the attention mechanism to extract features.In this paper,the human body is modeled by a skeleton diagram,25 main human joints are selected,and the joint features are extracted by convolutional neural network.Furthermore,we extract second-order bone information using joint information,and then fuse joint information with bone information to extract the strong coupling relationship between them.When human vision is observing a scene,it will spontaneously scan the entire field to find more interesting areas,and focus on the areas that need to be focused.The neural network is to imitate the structure and function of human neural network.Therefore,this article also draws on the idea of human attention focus and uses the attention mechanism in the neural network to extract the features that need to be focused.In this paper,the attention mechanism is incorporated in the spatial,temporal and channel dimensions,which not only retains the effective features but discards the useless features,thus reducing the risk of overfitting of the model.In addition,a non-local module is added to extract remote dependencies,and the collaborative feature information between the local and the global is enhanced,so as to extract the virtual connection characteristics of the human body in addition to the physical connection.Secondly,this article proposes a two-stream structure model based on the first point of work.Human body motion is a process in which joints and skeletons cooperate with each other and dynamically change over time.Therefore,the human body motion information along the time dimension also contains a large number of effective features.Therefore,this paper designs a motion flow.The spatial-temporal flow is used to identify and predict human actions by extracting the human spatial-temporal information.The motion flow extracts the motion information of the human action to represent the action,and then uses the two-stream structure model to fuse the scores of the spatial-temporal flow and the motion flow to further improve model performance.Due to the high complexity of the two-stream model,this paper combines the branches of spatial-temporal flow and motion flow on the basis of the two-stream structure model to reduce the number of parameters,and merges the two-stream features in the middle and later stages of the network model.Although the later feature fusion improves the performance of the model,the parameter reduction is small.The mid-term feature fusion reduces the parameters by about half,which reduces the amount of calculation of the network model,thereby improving the learning efficiency of the model,but there is no significant loss of model recognition accuracy.Finally,this paper conducts ablation experiments on this method on the NTURGB+D data set.The experiment proves that the method based on the attention mechanism proposed in this paper can extract effective local features,and verifies that the two-stream model can improve the effectiveness of recognition efficiency.
Keywords/Search Tags:Action recognition, Skeleton topology map, Attention mechanism, Twostream structure, Feature fusion
PDF Full Text Request
Related items