| Image object detection is an important cornerstone of visual analysis and understanding,aiming to identify the class of all objects in an image and locate each object with a bounding box.With the rapid development of big data,artificial intelligence,and computer vision,the object detection algorithm based on deep learning has achieved breakthrough results,making the technology widely used in various fields such as intelligent medical,security monitoring,intelligent transportation,and automatic driving.Although the effect of deep learning algorithms in complex scenes is far superior to traditional algorithms,some problems in its network model still limit the detection accuracy and efficiency,such as: weak expressing ability of deep multi-scale features,unbalanced sample numbers of different categories,low quality predictions,task-level imbalance of the training process,low performance of the non-maximum suppression process,slow inference speed and high energy consumption of large models.While the latest research work has made some improvements to them,there are still a series of flaws that need to be improved.Therefore,in view of the four problems to be solved,multi-scale object detection algorithms based on deep neural network are deeply analyzed and studied.The main research contents and achievements are as follows:Aiming at the problem of weak expressing ability in deep multi-scale features,a feature pyramid network based on channel information enhancement is proposed.Through summarization and analysis,it is believed that the problem is caused by three defects in the feature pyramid network.Existing methods usually design models through experience and intuition to solve one of these problems.In order to improve the problem in a more targeted manner,a sub-pixel skip fusion module is designed to alleviate the attenuation of channel information,a sub-pixel context enhancement module is designed to alleviate the dilution of feature fusion,and a channel attention guidance module is designed to improve the aliasing effect of the fusion process.The designed network model novelly borrows sub-pixel convolutions in super-resolution tasks and introduces contextual information and attention mechanisms.The experimental results show that the proposed method enhances multi-scale features,better improves the structural problems of the feature pyramid network,and is superior to similar methods in terms of accuracy and speed.The problem of training imbalance between each scale in multi-scale detectors has not received much attention.Multi-scale training can be regarded as multi-task learning.The experiment indicates that the loss value of each scale fluctuates frequently,and the value range is different,which will lead to insufficient training of some scales and affect the overall accuracy of the model.To solve this problem,a dynamic multi-scale object detection loss optimization algorithm is proposed.Specifically,inspired by the uncertainty task weighting,an adaptive variance weighting method is designed to dynamically adjust its weights by calculating the variance of loss values at each scale,which is more interpretable than the weights based on backpropagation training;then a reinforcement learning optimization algorithm is proposed to further study the training imbalance and optimize the scheme.The experimental results show that the high-level scale of the one-stage detector is not fully trained.The proposed algorithm improves the training imbalance of the multi-scale detector and improves the overall accuracy by about 1% AP.On the basis of the above two research,a multi-scale reinforcement learning strategy is proposed to improve the feature-level and task-level imbalance problem of multi-scale.Each scale in a multi-scale detector cannot be treated equally and independently,so a dynamic feature fusion algorithm is designed to amplify the influence of important feature scales in the training phase to improve feature imbalance without introducing additional model parameters.Meanwhile,a compensatory scale training algorithm is designed to strengthen the supervision of the undertrained scales.The overall algorithm draws on the Markov decision process to design states and rewards with multi-scale loss values to jointly optimize the two imbalance problems.Experiments show that the proposed algorithm improves the overall accuracy of the model without increasing the computational burden,reaching 48.1% AP.Aiming at the problem of high energy consumption of deep detectors,a multi-scale object detection model based on spiking neural network is proposed.The spiking neural network simulates brain nerves to transmit information with discrete binary signals,and can run with extremely low power consumption on neuromorphic chips.On the basis of existing spiking neural network research,the proposed model introduces a multi-scale framework,optimizes the spiking model conversion scheme,and designs a corresponding encoding method according to biological neurological phenomena to speed up information transfer.Experiments verify that some conversion methods in image classification tasks are not suitable for object detection,and prove that the designed model is superior to existing methods,and the running energy consumption is thousands of times lower than common detection models that rely on GPUs to run.The proposed multi-scale object detection algorithms based on deep neural network can better improve some problems existing in the existing methods,and promote the performance of the detector in terms of accuracy and efficiency.They can be applied to some specific task scenarios,such as small/large object detection,low power consumption detection and others. |