| Traffic lights play an important role in directing traffic and guiding drivers in urban road traffic system.Accurate and rapid detection of traffic light is very significant to the development of urban traffic.However,the small size of traffic lights compared with other objects and the complexity and variety of the true road make it difficult to achieve the actual detection accuracy of traffic lights in realistic scenarios.At present,convolutional neural networks and attention mechanisms are widely used in computer vision fields such as object detection due to their advanced performance and have achieved certain achievements.Therefore,the thesis uses convolutional neural network and attention mechanism to study traffic lights detection.The main contributions of the thesis are as follows:(1)DCA-YOLOX,a traffic lights detection algorithm using dilated convolution to fuse context is proposed.This algorithm uses multi-head self-attention mechanism and dilated convolution to improve feature extraction and feature fusion networks respectively.Incorporating the multi-head self-attention module into the feature extraction network enriches the global information of the deep feature map and provides sufficient context for feature fusion.The multi-branch dilated convolution module is used to connect feature maps at different scales,so that the high-resolution feature maps can fully fuse the contextual information of the deeper low-resolution feature maps of the network and achieve the full fusion of global and local features.The DCA-YOLOX algorithm is trained and tested on the LISA traffic lights dataset and Bosch Small Traffic Lights Dataset,and the experimental results show that the introduction of the multi-head selfattention module and the multi-branch dilated convolution module improves the detection accuracy of the algorithm for traffic lights.(2)Multi-scale DETR,a traffic lights detection algorithm based on deformable attention mechanism is proposed.The deformable attention mechanism reduces the computation of DETR and multi-scale fusion improves the detection accuracy of traffic lights.Multi-scale DETR uses the deformable attention mechanism to make each reference point focus on only a few surrounding key sampling points.The encoder and decoder process the feature map using multi-scale feature fusion,fusing four scales of features,the multi-head attention principle is also used to assign two attention heads to each scale of the feature map to achieve the fusion of features at different scales.The multi-scale DETR algorithm was trained and tested on LISA traffic lights dataset and Bosch Small Traffic Lights Dataset,and the experimental results show that the new algorithm can effectively improve the detection accuracy of traffic lights with the introduction of deformable attention mechanism and multi-scale fusion. |