| Object Detection is one of the most fundamental and challenging tasks in the field of Computer Vision,which consists of object classification and localization.At the same time,object detection serves as the cornerstone of Image Understanding and Computer Vision,which provides a strong feature classification basis for Video Tracking and so on.Recently,Deep Learning utilizes powerful hierarchical feature extraction and learning capabilities to demonstrate greater robustness and generalization.Nevertheless,the existing methods fail to achieve excellent results in the face of multi-size object detection and dense clusters of small object detection.To solve these problems,this thesis aims to investigate the enrichment of multi-size object information in multi-stage feature maps by utlizing feature fusion and reconstruction,capturing the fine-grained feature of obejcts under rich receptive fields by adopting visual attention mechanism,and enhancing the feature representation capability of Convolutional Neural Networks for small objects.The research of this thesis is as follows.(1)To address the problems of inefficient multi-scale object detection and redundant object information in the feature maps of SSD algorithms,this thesis proposes a single-stage multiscale object detecter based on feature fusion and reconstruction.Firstly,a multi-scale attention model is proposed to enhance the global semantic feature of objects in the shallow-mediumdeep feature maps.Secondly,an adaptive hierarchical feature map weighting mechanism is designed to complete the fine-grained information fusion of multi-stage feature maps.Finally,to solve the problem of information redundancy in the deep feature maps,we proposes a feature map reconstruction module to segment,concentrate and reconstruct the result feature maps,focusing on the key information in the feature maps with eliminating redundant object features.(2)To address the problems of inadequate image feature extraction and poor feature expression of small objects caused by existing visual attention models separating channel-wise and space-wise attention,this thesis proposes a Feature Information-interaction Model,which designs an information weaving structure to fuse multidimensional feature maps and complete the fine-grained fusion of channel-wise and space-wise attention feature maps.On this basis,an Adaptively Cyclic Feature Information-interaction Model is proposed,which focuses on local feature maps repeatedly and completes the extraction,fusion and enhancement of global semantic features and local contextual information of the object.Extensive experimental results show that our approach outperforms existing attention models and exhibits better performance on object detection benchmark datasets. |