| Convolutional neural network is an important part of deep learning technology,which is widely used in various computer vision tasks.The research hotspots of convolutional neural network include parameter optimization,structure optimization and adaptive convolutional neural network.Among them,the adaptive convolutional neural network is also called "dynamic" neural network.Its core idea is adaptive reasoning,that is,the network can dynamically adjust it’s own structure and parameters for reasoning according to different inputs,so as to obtain better network performance,such as higher reasoning efficiency,higher accuracy,etc.The adaptability of adaptive convolutional neural network can be reflected in three aspects:sample adaptation,spatial adaptation and time adaptation.This paper focuses on the study of sample adaptation,and proposes three adaptive modules to improve the performance of convolutional neural network model in processing various computer vision tasks.Specifically,the main work of this article includes the following three aspects.(1)Convolution module is the core component of convolutional neural network.In practical application,the stride of convolution operation is mainly selected manually or through some search methods,which is called "parameter adjustment".This approach is either difficult to determine the appropriate stride,or requires a lot of time,and the efficiency and effectiveness are not ideal.In response to this issue,this article views the impact of stride on convolution operation results from a sampling perspective and proposes a convolution module with adaptive stride effect.This module does not directly optimize stride parameters,but instead performs adaptive sampling(filtering)on the convolutional feature map obtained with a stride of 1,indirectly achieving the convolutional processing effect of adaptive stride.Through applying the convolutional module to existing representative models and conducting experimental tests,the results show that in classification tasks,the accuracy of Alex Net,Mobile Net,and Res Net can be improved by 2%,1.16%,and 2.61%,respectively;In target detection tasks,the m AP(mean average precision)of YOLOX and YOLOV6 can be improved by 0.6% and 0.7%,respectively;In image segmentation tasks,UNet’s Acc_cls and m IU indicators can be increased by 4.04%and 3.24% respectively.(2)In addition to convolution module,pooling is another important part of convolutional neural network.A new pooling method,unsigned-min pooling,is proposed to address the issue of insufficient feature extraction capabilities in existing pooling methods such as maximum pooling and average pooling,making it difficult to effectively extract various features of images.This pooling can extract the absolute value of the minimum value in the pooling area of the feature map and preserve the minimum value feature information(often edge features).To some extent,unsigned-min pooling can complement existing pooling methods.On this basis,this article considers dynamic fusion of maximum pooling layer,average pooling layer,and unsigned-min pooling layer,and proposes two adaptive hybrid pooling modules.This module can adaptively select a hybrid pooling method based on the network input feature map,thereby helping the network to better extract useful features.Through applying the hybrid pooling module to existing representative models and conducting experimental tests,the results show that in classification tasks,the accuracy of Alex Net and Res Net can be improved by 2.93% and 1.21%,respectively;In target detection tasks,the m AP of YOLOV6 can be increased by 1.11%.(3)Multi scale fusion processing and Fourier transform are two commonly used methods in image processing.Multi scale fusion processing can fuse low-level semantic information and high-level semantic information,and improve model performance when there are both large and small target objects in the image.Fourier transform can transform images into frequency domain for analysis and processing.This article combines Fourier transform and multi-scale fusion to propose a Fourier transform based adaptive image feature extraction and multi-scale fusion module.This module extracts semantic feature information from images at different scales through different cropping in the frequency domain,and further selects and filters the semantic feature information obtained through an adaptive processing component to improve the effectiveness of multi-scale fusion.Through applying the proposed multi-scale fusion module to existing representative models and conducting experimental tests,the results show that in classification tasks,the accuracy of Alex Net and Res Net can be improved by 2.11% and 1.40%,respectively;In target detection tasks,the m AP of YOLOV6 can be increased by 1.38%To sum up,the main contribution of this paper is to carry out relevant research on the design of adaptive processing module of convolutional neural network,focusing on three aspects of convolution,pooling and multi-scale fusion.The proposed module has certain versatility,and can be applied to existing convolutional neural network models in a "plug and play" manner.Simulation experiments show that these modules can improve the performance of convolutional neural network,and have certain practical value and reference significance. |