| As a very challenging task in the field of computer vision,motion recognition is widely used in the fields of human-computer interaction,intelligent monitoring,automatic driving,motion-assisted correction and so on.In recent years,with the development of graph convolutional neural network,action recognition based on skeleton data using graph convolutional neural network as the main frame has achieved very advanced results.However,the existing network model based on spatio-temporal graph convolution still has many shortcomings and difficulties.Firstly,it is difficult to exchange information among features of higher-order nodes due to spatial limitations of graph convolution kernel.Secondly,local details around nodes are lost during the production of skeleton sequences,resulting in poor performance in recognizing some similar actions.Optical flow modes are applied to make up for sparse skeleton sequences,but optical flow modes also have the problem of insufficient global spatial representation.Solving the current problem has become the direction of most motion recognition researchers’ efforts.This thesis will focus on the above two problems in detail and put forward specific solutions.(1)For the difficulty of high-order information interaction,the main problem is that the distance between the node is too far to carry out feature interaction.Therefore,the supernode feature is constructed to shorten the distance between the node and break through the limitation of the graph convolution kernel.According to the features of hypernode,a hypernode spatio-temporal graph convolution module is constructed.Skeleton features and hypernode features can share features in the hypernode spatio-temporal graph convolution module to extract the features of higher-order joints.In this thesis,the module is applied to the 2S-AGCN network model and tested on large public data sets NTU RGB+D,NTU RGB+D 120 and Kinetics-skeleton to prove its effectiveness.The accuracy of X-Sub and X-Setup protocols in NTU RGB+D 120 data set increased by 2.4% and 2.1%,respectively,to 86.1% and 87.9%.A relatively advanced effect is achieved in the model based on skeleton data.(2)On the basis of the above model,aiming at the problem of missing the feature information of different modes,a shared action recognition model based on skeleton optical flow hypernode(2S-SNGCN)is constructed with the feature of hypernode as the medium.Since the hypernode feature can ignore the dimensional differences between different modes,extract the important information and unify the dimensions,the hypernode feature of the current mode can pay attention to the feature differences in another mode to make up for its own feature defects,and then enhance the original sequence feature through specific modules to achieve the purpose of dual-mode feature sharing.The model in this thesis was modified on JOLO-GCN model and tested on three datasets: NTU RGB+D,NTU RGB+D120 and Kinetics-400.Compared with JOLO-GCN,the accuracy of X-Sub and X-Setup protocols in NTU RGB+D 120 data sets increased by 2.9% and 3.0%,respectively,to90.5% and 92.7%.Compared with the single-mode model,this model has a very large improvement,and in the multi-mode model also achieved a very advanced effect. |