Object detection plays an important role in face recognition,instance segmentation,crowd base and human pose recognition.Aiming at the problem of low detection accuracy of single frame image in video,this paper proposes a key point detection algorithm Point-GAT;The algorithm optimizes the backbone network by using shortcut connection to solve the learning degradation problem caused by the increase of network depth;At the same time,deconvolution and feature fusion are used to enhance the detection effect of small-scale objects;Finally,a graph classification method MS-GAT is proposed,which maps the semantic relationship between categories through directed weighted graph,achieves better classification effect,and further improves the detection accuracy of single frame images in video.Due to the lack of context consistency information,it is impossible to detect occluded or blurred objects based on a single frame image.Based on the Point-GAT algorithm,this paper introduces the spatio-temporal attention mechanism to model the relationship and spatio-temporal characteristics of key points,decouples the temporal characteristics of three consecutive frames,then constructs the single spatio-temporal graph of adjacent frames,and then convolutes all the single spatio-temporal graphs,so as to obtain the spatio-temporal information of each key point in time and space dimensions.The improved algorithm can obtain the semantic category and temporal and spatial correlation between detection objects from multi frame images,achieve better positioning and classification of objects in video,solve the problems of object occlusion and image blur,and improve the accuracy of video object detection.The algorithm proposed in this paper is tested on COCO data set and Image Net VID data set respectively.Experimental results show that the algorithm achieves the highest accuracy of 48.3% AP on COCO data set and 85% AP on Image Net VID data set. |