| With the continuous development of the primary research of computer vision,object detection task has always been a hot direction.It has important practical significance to act as an auxiliary or main task in many fields,such as robot equipment,industrial parts anomaly detection,monitoring equipment,etc.Computer vision tasks are very diverse.In addition to general object detection tasks,it also has other branches,such as face detection,pedestrian detection,pedestrian recognition,etc.Due to the critical position of object detection in scientific research and industry has become a research hotspot in recent years.At present,the development speed of deep learning increases exponentially,which also drives the rapid development of object detection algorithms.The primary purpose of the object detection task is to find the specific location of the object of interest in an image and give the category information of the object.For example,face detection aims to find the specific location of the human face,and pedestrian detection aims to find the specific location of pedestrian in an image.Due to the great success of deep learning in various computer vision tasks,object detection technology based on deep learning has gradually emerged in recent years.The object detection task is mainly divided into feature extraction,feature enhancement,and object prediction.The improved parts of the object detection network are usually feature enhancement and object prediction.The current state-ofthe-art object detectors usually use feature pyramid networks as the feature enhancement part.The feature pyramid network fuses multi-scale feature information,enabling the detector to handle objects of different sizes better.However,reducing the feature dimension in FPN will cause significant information loss.The feature pyramid network only has the top-down structure,which supplements the deep information to the shallow layer.The deep layer features are not supplemented,and the information is still lost.Aiming at the insufficient feature fusion in current object detection algorithms and the gap between deep semantic features and shallow detailed features,an object detection model based on multi-scale feature enhancement is proposed.The main research contents are as follows:(1)Aiming at the semantic gap between deep information and shallow information,a Scale Fusion module is designed to gradually supplement the richest detailed information in the shallow features into the deep features.Since the deepest features contain rich semantic information,and the detailed information gradually disappears with multiple downsampling operations,there is a severe semantic gap between the deep feature information and the shallow feature information.Due to the inconsistency of semantic information between features at different scales,directly fusing features with significant semantic differences will reduce the ability of multi-scale feature representation.A multi-scale semantic fusion module is added to the scale fusion.The detailed information is supplemented into the highest-level features through the middle layer to reduce the impact of the semantic gap.(2)Aiming at the problem of insufficient feature fusion,a bidirectional pyramid module is designed.The pyramid module and the reverse pyramid module are combined to supplement the semantic information of the deep features into the shallow features and gradually transfer the detailed information of the shallow features to the deep layers.feature.The scale fusion and bidirectional pyramid modules fuse features from different levels,alleviating the information loss caused by dimensionality reduction.(3)For the feature enhancement problem of object prediction,a pixel-region attention module is designed to obtain the correlation between each position of the image and different regions with only a small amount of calculation,thereby improving the detection accuracy. |