| Object detection is one of the core problems in the field of computer vision and the basis for solving high-level vision tasks.In recent years,deep learning technology represented by Convolutional Neural Networks has swept the computer vision fields,and research of object detection has made great progress in this wave.However,problems such as unbalanced samples and unreasonable sample selection in the training process restrict its further development.The huge computational load and network scale also bring challenges to the implementation of the algorithm in practical scenarios.In view of the demands of object detection network for higher detection accuracy and faster inference speed,this dissertation conducts research from two aspects: training optimization and network structure simplification.In this dissertation,the training process is optimized by exploring the influence of sample imbalance and label assignment problem on the network during training,so that the network can improve the accuracy without greatly increasing the network size.In addition,according to the characteristics of the detection network,a channel pruning method and a dynamic inference method specially designed for the detection task are proposed to accelerate the inference of the network.The main research contents and contributions of the dissertation are summarized as follows:First,an object detection algorithm based on Io U uniform distribution is proposed,which aims to solve the problem of imbalance distribution of positive samples at different quality levels in the current two-stage method.By perturbing the ground-truth and sampling uniformly,the same number of samples for each Io U segment are obtained to replace the outputs of the original region proposal network as the second-stage training samples.This method effectively increases the proportion of high-quality samples in the training process,so that the network can obtain more accurate detection boxes when faced with high-quality proposals.In addition,by updating the features of the proposal after regression,this method also solves the problem of feature offset encountered in the inference process of Io U prediction branch,and enhances the network’s ability to predict the localization quality.Second,an object detection method based on a multi-label assignment strategy is proposed.By analyzing the influence of two current mainstream label assignment rules on the detector,a method is proposed to define two sets of training samples using one-to-many and one-to-one label assignment rules for training.And an alignment module is also designed to generate masks by fully mining the relationship between the prediction results,so as to realize the transformation from one-to-many classification scores to one-to-one classification scores,thus playing the role of replacing the non-maximum suppression algorithm.The method combines the advantages of both rules.By using this,the network can get rid of the dependence on the non-maximum suppression algorithm.Thus,the complete end-to-end object detection can be realized and the performance is effectively improved.Next,for the model compression problem of the detector,the current channel pruning algorithms mainly designed for the classification network ignore the difference between the classification task and the detection task,and they cannot accurately locate the key channels of each layer in the detection network.This dissertation proposes a localization-aware channel pruning algorithm for object detection networks.The importance of each channel to the performance is measured by the location-aware loss and local reconstruction error,helping to determine the channel that contains both classification and location key information.Then,through the constructed localization-aware network,the channel pruning operation of the backbone and the detection head is realized,which greatly compresses the number of network channels while maintaining the performance of the model.At last,aiming at the requirement of object detection for dynamic inference structure,an object detection algorithm based on dynamic inference is proposed.This method enables an object detection framework with dynamic network depth by constructing a multi-scale dense connection structure,and by simplifying the structure of the multi-scale feature fusion and the detection head part,the problem of the sharp increase in the amount of computation and parameters caused by the dynamic reasoning process is solved.The algorithm can adaptively adjust the network structure according to different inputs,and speed up the overall inference speed of the network while ensuring the accuracy by dynamically reducing the redundancy of the network structure. |