| Human keypoint detection is a fundamental task in computer vision,crucial for applications such as action recognition,human-computer interaction,and video surveillance.However,existing keypoint detection algorithms primarily focus on visible light images.Under lowlight conditions,the quality of visible light cameras is severely compromised,consequently affecting the accuracy of detection.In contrast,infrared images,which capture thermal radiation using infrared cameras,are not affected by illumination,giving them an advantage in low-light conditions.Utilizing infrared images for keypoint detection holds significant practical significance in domains like behavior analysis,safe driving,and elderly care.The aim of this paper is to effectively apply keypoint detection algorithms to the field of infrared images and enhance their adaptability under low-light conditions.The specific research aspects include the following:1.Provide an overview of keypoint detection algorithms for visible light from both domestic and international sources,analyzing their strengths and weaknesses to establish a foundation for subsequent research.2.Devise an improved solution for keypoint detection in infrared images based on an enhanced version of Simple Baseline.This solution utilizes YOLO for human object detection,followed by feeding the detected human instances into a human keypoint detection network.To address the blurriness and low resolution characteristic of infrared images,various enhancement strategies are employed,including multi-resolution feature fusion,channel attention,and spatial attention.Additionally,deep separable convolutions are employed to reduce parameter count and enhance speed.Furthermore,a dataset specific to multi-person keypoint detection in infrared images is created,and experimental validation demonstrates the advantageous performance of this method in the realm of multi-person keypoint detection in infrared images.3.Investigate a keypoint detection network based on the Transformer architecture for infrared images.Traditional convolutional network-based methods have limitations when applied to infrared images.Hence,a model named FEPose is proposed,which leverages the self-attention mechanism of the Transformer encoder to establish spatial dependencies within images,enhancing the perception of keypoints and mitigating the impact of low resolution on detection accuracy.Moreover,Filter and Enhance layers are designed to reduce background interference and improve the grayscale response of human subjects.To align keypoint detection with realworld scenarios,a dataset of infrared multi-person keypoints in natural scenes is constructed,and experimental validation reveals significant improvements in human keypoint detection for infrared images using this model. |