| Objects detection in aerial images has always been a research hotspot in computer vision related areas.Although many works have achieved remarkable success in multiple topics,most of the researchers rarely put the research focus on the discrepancy between general object detection methods which are based on deep learning models and aerial image parsing since most of the detection models are designed and analyzed solely in normal image scenarios.Through exhaustive comparison analysis of normal images and aerial images,the bottlenecks hindering the average precision improvement of current aerial image detection model is mainly because the shooting conditions between these two types of images have significant inconsistencies,which leads to great variations in the aerial object size,extremely small objects with both absolute and relative pixel areas,the background information redundancy,and the blurring features of foreground objects.Consequently,the detection model suffers from issues such as yielding false positive results and setting appropriate parameters.These problems lead to the relatively low accuracy and recall of the existing detection models and the model performance in aerial object detection task is difficult to further improve.Our research aims to optimize the existing network structure to reduce the impact of the above issues in object detection model by reinforcing the contour features,since they are more critical in aerial images than in normal images.Based on the characteristics of Fourier transform for separating high-frequency and low-frequency components of images in the frequency domain,our work proposes to use the discrete cosine transform module as a plug-in structure in the deep learning model,which could encode and convert the image data by replacing fewer layers in the shallow convolution modules without changing the end-to-end characteristics of the whole model.So the data could be transformed into a form that is more conducive for neural networks to fit contour features,and the depth of the network is reduced to a certain extent so the range of its receptive field is limited as well.Through exhaustive experiments,this work tests and analyzes the flaws of the current convolutional neural network model in object contour feature extraction.In order to address this defect,this research implements several versions of the discrete cosine transform module on the CPU and GPU and optimizes them gradually.And the detection performance of the improved model after adding the plug-in structure was tested on three commonly used state-of-the-art network structures in comparison with the original model on two benchmark datasets.The experiment validates that the optimized network model with the discrete cosine transform plug-in module achieves performance improvement in terms of m AP and precision while the recall rate is decreased since the model is less sensitive to noises,and this plug-in structure improves the AP on some small targets with relatively fixed contours. |