| With the popularization and development of smart phones,video has become the mainstream form of media communication.Since human action is the main body of event development,the recognition and prediction of human action is the focus of video understanding and analysis based on computer vision.Compared with RGB images,the skeleton data of the human body can describe the human action well,and it is robust to the complex background and the camera angle changes.For non-Euclidean data such as skeleton data,how to use graph convolutional network to extract rich spatial-temporal features is a key to constructing human action recognition and motion prediction models,and it is also the research focus of the National Natural Science Foundation of China.The content of this thesis is divided into the following two parts:Aiming at the problem that the calculation of convolutional aggregated joint features of spatial feature is complicated and the spatial relationship of long-distance joints such as hands cannot be effectively obtained,this thesis proposes a human action recognition algorithm based on separable spectral convolutional network.First,the thesis designs a static graph based on the physical structure of the human body,and derives a dynamic graph of the global response from the joint motion information,and then derives the first-order separable spectral convolution operation to aggregate the global and local spatial features of the human body’s joints.Secondly,the thesis introduces the separable gated temporal convolution module to focus on the joint motion information to adaptively adjust the receptive field range of the temporal convolution,so that the network can learn the discriminable temporal information in the action sequence.Finally,the thesis designs a cross-modal recognition method to solve the problem of overfitting of graph convolutional networks.Experiments on public datasets show that the method proposed in this thesis achieves international leading performance.Aiming at the problem that the RNN-based recursive prediction network does not explicitly use the spatial connection relationship and motion information of the body joints,this thesis proposes a human motion prediction algorithm based on a multi-branch graph convolutional network.First,the thesis designs a global spatial-temporal graph and a multi-scale mixup temporal convolution module to encode the features of different actions,and constructs an encoder of multi-branch graph convolutional network to obtains the spatial-temporal features and motion information through the positions and velocities information of the human joints.Secondly,in the decoder module,the thesis introduces graph-based GRU to recursively predict human action,and use residual connections and joint motion information to stabilize the results of prediction.Finally,the thesis designs a temporal weighted loss function to pay more attention to the prediction of the early time step,and encourage the network to achieve more accurate in the early stage,and reduce the accumulation of errors in the model. |