| Currently,as the mileage of highway construction continues to increase,the importance of road maintenance has gradually become prominent.The focus of highway management is gradually changing from planning and construction to maintenance.Pavement disease detection is an important part in road maintenance.And cracks,as the main type of pavement diseases,can provide guidance information for maintenance measures.Failure to detect and repair cracks in a timely manner can increase the maintenance cost of the highway and significantly shorten its service life.It is of great significance to ensure the safety of transportation and improve economic benefits.In the early stage,the method of detecting pavement cracks mainly relied on manual investigation,which is costly and inefficient.And the detection results are easily affected by subjectivity of the inspectors.Therefore,it is necessary to design an automatic,accurate,and efficient pavement crack detection method.Compared with traditional methods,deep learning-based pavement crack detection methods have significant advantages,especially the deep convolutional neural network(CNN),which can automatically learn crack features and achieve end-to-end automated detection.Therefore,this paper thoroughly study CNN-based pavement crack detection and deep learning algorithms,and summary the challenges and difficulties of CNN in crack detection.It can be found that the locality of convolution operations,the complex pavement detection scenes,the feature heterogeneity of cracks and their topology structure will degrade the performance of pavement crack detection.In particular,when uneven illumination leads to low contrast and when there is noise interference in the background,the edge detection accuracy of cracks is limited.Thus,the detection model requires to model strong long-range dependencies.The advantage of convolution is in extracting local information,but it is insufficient in modeling long-range dependencies.Transformer makes up for the shortcomings of CNN,and can model long-range dependencies and global information between input pixels excellently through its unique self-attention mechanism.However,it lacks the ability to extract local information effectively.To address the above issues and improve model detection performance,this paper combines the advantages of CNN in processing low-level visual information and the advantages of Transformer in modeling long-range dependencies.The effective fusion mechanism is proposed,and two pavement crack detection methods based on Transformer and deep CNN are designed to meet different detection needs.(1)Based on the rapid classification and localization requirements of cracks,a pavement crack detection model that integrates Swin Transformer and YOLOX is proposed.This model addresses the problem of insufficient long-range dependencies of pure CNN by introducing Swin Transformer modules to enhance the feature extraction ability of cracks in complex scenes,which can better preserve the edge details of input images to improve the accuracy of crack detection.A global attention guidance module is designed to improve the feature fusion of the feature pyramid by guiding high-level semantic information to low-level spatial detail information,which can effectively improve the feature fusion ability of multi-class and multi-scale cracks.α-Io U-NMS is applied in the post-processing stage to improve the detection accuracy of occluded and overlapping objects.Experimental results show that under the premise of ensuring real-time performance,the m AP of this model is improved by 3.37%compared to the original YOLOX model,which can provide reference value for low-cost pavement crack detection in urban roads.(2)A pixel-level pavement crack detection algorithm based on Swin Transformer and U-Net is proposed to address the precise segmentation requirement of crack detection.This model tackles the problem of low accuracy in detecting cracks with complex topological scales in complex scenes.After encoding the high-level semantic feature map,the model introduces a multi-scale atrous convolution module that integrates Swin Transformer to accurately capture crack edge features in complex scenes and obtain more precise crack edge segmentation effects.To address the issue of imbalanced crack datasets due to a large discrepancy between the number of crack and background pixels,a loss function that combines Dice Loss and Focal Loss is proposed to focus on learning difficult samples.To mitigate the problem of increased model parameters and reduced inference speed resulting from the improvements,this study utilizes the Ghost module to remove redundant feature maps in the decoder and achieve the light model.Experimental results show that the improved model increases the m Io U metric by 1.27% and FPS by 3.56 fps compared to the original U-Net model with a small increase in the number of parameters.This pixel-level detection model can generate global image analysis results,which is beneficial for the subsequent fine classification of cracks and quantitative measurements of their severity.In summary,this study proposes the two detection methods that combine deep CNN and Transformer.And it can provide new ideas and methods for the field of pavement crack detection and recognition,and also offers technical and theoretical support for the development of automated detection devices.Meanwhile,this paper explores the fusion mechanism between the CNN and Transformer,which can provide some significance for the practical application of their fusion research in industrial and agricultural production fields. |