Font Size: a A A

Research On Action Recognition Algorithm Based On Bottom-up Key Point Location

Posted on:2023-11-15Degree:MasterType:Thesis
Country:ChinaCandidate:Y F YuanFull Text:PDF
GTID:2558307061461154Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Action recognition technology is an important technology for computer video understanding and analysis,and it has broad application prospects in the fields of abnormal behavior detection,human-computer interaction,and motion behavior analysis.With the popularization of deep learning,both key point positioning technology and action recognition technology have been developed rapidly.Due to the strong connection between these two technologies,more and more researchers have begun to explore key point positioning based action recognition.The key point location algorithm predicts the position of the bone key point on the video frame to obtain the bone information sequence containing the spatio-temporal information,and then the action recognition algorithm further extracts the features to distinguish the action type.Although this technology has achieved good results so far,there are still problems,e.g.,the imbalance between the accuracy and the complexity of the key point positioning algorithm,and the poor feature extraction of spatial-temporal in the action recognition algorithm.This research mainly studies the bottom-up key point location algorithm and the action recognition algorithm based on graph convolutional neural network motivated by these problems.The main contributions are as follows:(1)A new model named FF-Hourglass is designed,which is a lightweight hourglass network with improved feature fusion mode.The main innovations are in the following three aspects.First,the original feature fusion mode of the hourglass network is improved which means an attention mechanism based on skip connection within the stage and multi-scale fusion mode is designed to make full use of the rich multi-semantic and multi-scale information in the hourglass network,enhancing the model’s positioning ability on human instances of different scales.Second,center labeling,the offset auxiliary map and the multi-layer offset auxiliary map are proposed to improve the matching criteria,which expands the methods of the matching algorithm.Third,the network is lightened from the four perspectives of structure strategy,feature extraction method,activation function and knowledge distillation to reduce the amount of model parameters and improve lightweight performance.For the above-mentioned designs,comparative experiments are conducted on the large public dataset COCO 2017 to verify their effectiveness.In the end,the AP and AP50 of FF-Hourglass can reach 70.2%and 89.9%respectively,and the parameter amount is only 41.3M,which has a good balance between model accuracy and lightweight performance.(2)An action recognition algorithm Ms Dy-GCN based on dynamic multi-stream graph convolutional network is proposed.The main innovations are in the following three aspects.First,the definition of topological graph and graph convolution has been redefined.And the learnable adjacency matrix is introduced to generate a dynamic graph structure that can be adaptively adjusted during the training process to optimize the ability of algorithm of modeling the temporal and spatial dependencies between key points.Second,from the perspective of the global structure,the branch network with second-order information and motion information as input is added,and the simple attention mechanism is used to realize the fusion of the branch network to improve the recognition ability of the model.Third,from the perspective of local structure,the spatio-temporal attention mechanism is introduced in the spatio-temporal graph convolution unit to better concentrate the network’s focus area on the crucial area and enhance the network’s feature extraction ability.Through comparative experiments on the public datasets Kinetics-Skeleton and NTU-RGB+D,it is verified that the above designs can effectively improve the feature extraction quality of the model.For the final model Ms Dy-GCN,its Top-1 and Top-5 accuracy on the Kinetics-Skeleton dataset are 38.6%and 61.4%respectively,and its Top-1 accuracy on the X-Sub and X-View sub dataset of the NTU-RGB+D dataset are 90.5%and 96.1%respectively,achieving comparable performance to mainstream algorithms on both datasets.(3)A real scene dataset of multiple performers is constructed,multiple perspectives,and multiple scenes.What’s more,the bottom-up key point positioning algorithm FF-Hourglass and the action recognition algorithm Ms Dy-GCN based on graph convolutional neural network designed above are combined to applied to this real scene dataset.The experiment shows that the designed algorithm achieves 97.6%accuracy on the dataset,which further proves the robustness of the proposed model.(4)A visual interface tool is designed based on the Py Qt5 graphical interface library.At the same time,FF-Hourglass and Ms Dy-GCN algorithms are embedded in the interface tool to realize the two major functions of key point positioning and action recognition.The tool performs key point positioning and action detection and then displays the processing results,which can be used for real-time camera streaming and local video flow.It simplifies the application process in the actual scene and is also convenient for operation.
Keywords/Search Tags:key point positioning, bone data, action recognition, graph convolutional network, attention mechanism
PDF Full Text Request
Related items