| There is a huge demand for computer vision technology in the field of video surveillance field.Therefore,target tracking technology based on computer vision is widely used in many fields,such as intelligent monitoring,automatic driving,education and human-computer interaction.Although the current deep learning has solved many common problems in traditional algorithms,and has also promoted the development of computer vision-related technologies to a certain extent,tracking technology still faces to problems,included background interference,object occlusion and ignoring the connection between characters and scene graphs.During the training process,it is sometimes easy to forget to analyze joint attention.These all factors lead to get a poor detection accuracy and an unsatisfactory detection result.In order to solve these problems,this article conducts in-depth research on the target tracking algorithm based on the attention mechanism and joint attention.The specific contents are as follows:First,as an essential model of target tracking algorithms,it is necessary to introduce the basic knowledge of convolutional neural networks.And a detailed description of the convolutional layers,activation functions,and pooling layers used to analyze convolutional neural networks.In addition,using a comprehensive introduction expounds basic networks.In deep learning,for a given image,the head features and positions can be extracted by using a deep neural network and adding the complex interaction of information between the scene and the head in the image at the same times.These are helpful for the model to be faster and learn more scene features that may be of interest based on the attributes of the head.Secondly,considering that there may be information interference in the scene,it will cause the tracking method to be unable to judge the disappearance of the target,and it is easy to introduce background interference information into the tracking model in the case of occlusion.An enhanced attention module is proposed,which can filter out the depth.It can strengthen the detection and tracking of the main scenes and characters,prevent the accumulation of background interference information in the tracking model in the scene with severe occlusion,and re-track the target object after the occlusion ends.Finally,the attention of the rest of the characters in the scene is also taken into account in the region of interest,which augments the standard saliency model with attention push,and finally the image is overlaid with a heatmap for visualization.Experiments show that the method achieves 88.9% AUC in visual object detection and80.1% AUC in joint attention detection,with improvements of 1.1% and 2% respectively.The final application in teaching scenarios can effectively infer the dynamic attention and joint attention of students and teachers in videos. |