The steel industry plays a pivotal role in industrial production,as one of the most important products in the steel industry,strip steel has been widely used in aerospace,machinery,automotive and other fields.During the production process,due to factors such as the environment or the manufacturing process,defects such as scratches,plaques,and marks appear on the surface of the strip steel,which in turn affects product performance and quality.Therefore,in the strip production process,it is particularly important for enterprises to control product quality by quickly and accurately locating the location of defects and classifying them.Manual detection mainly relies on the subjective judgment of workers,production costs are high and susceptible to environmental influences,and the use of machine vision to automatically detect defects is a good way to avoid the above problems.However,traditional machine vision-based methods require handcrafted features and cannot adapt to industrial environments with variable defect types.With the development of deep learning,multi-scale feature fusion has been applied to object detection tasks and achieved good results.In this paper,the multi-scale feature fusion technology is applied to the surface defect detection problem of strip steel to improve the detection performance of the model.The main research results of this paper are as follows:(1)A nested multi-scale feature fusion strip steel surface defect detection model(NMFNet)is proposed.The model includes encoder,decoder and feature refinement network,in which the encoder network uses a fully convolutional neural network and a lightweight attention mechanism to extract features of different sizes containing low-level spatial detail information and high-level contextual semantic information.The decoder network adopts a U-shaped structure to gradually fuse the multi-scale features obtained in the encoder stage and make predictions,and obtain a preliminary prediction salience map.The feature refinement network adopts a lightweight network structure to further optimize the rough prediction map output in the Encoder-Decoder stage and obtains the final prediction significance map.In addition,embedding of dilated convolution and fusion loss allows NMFNet to capture rich detail information without increasing the amount of computation too much.The experimental results show that NMFNet is superior to other methods,which can effectively suppress background noise and obtain strip surface defects with clear boundaries.(2)A strip surface defect detection model(CTFNet)based on multi-scale selfattention fusion is proposed.The model obtains multi-scale feature maps through feature scale transformation or feature extraction operations with scale transformations.For each feature map,this paper uses the convolution operation branch and the Transformer branch to extract features of different sizes,and then uses the feature fusion module to fuse these features containing global and local information and make predictions.Specifically,the convolution operation branch uses the pre-trained Res Net-50 to extract multi-scale features containing local detail information,the Transformer branch uses the serialized representation of the self-attention mechanism to further extract the features containing global context association information,and finally uses the feature fusion module to fuse the multi-scale features to achieve accurate detection of strip surface defects.Experimental results show that CTFNet performs well in the existing model and can generate significance graphs with clear boundaries and precise semantic information. |