Font Size: a A A

Weakly Supervised Object Detection Based On Bilinear Attention Feature Fusion

Posted on:2022-01-13Degree:MasterType:Thesis
Country:ChinaCandidate:C B CaiFull Text:PDF
GTID:2518306539962839Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The object detection has been a hot topic for many scholars in recent years.Its purpose is to classify the instance objects in the natural image and find the bounding box of the corresponding localization of the instance,It is widely used in medical image detection,face recognition,video surveillance and other fields.Traditional object detection techniques are based on a large number of image data sets with accuracy annotations.However,collecting and labeling this type of data takes time and effort.In contrast,object detection based on weakly supervised learning only uses image-level annotations.The information data can complete the construction of the object detection model.Image-level annotation only needs to annotate the classification information in the image,and does not require precise location of the object instance,so it is easier to obtain.Current weakly supervised object detection is mostly based on multi-instance learning methods,but the use of multi-instance learning is limited by the region proposal algorithm,which makes it easy to fall into local optimization,that is,the detected bounding box can only contain the local location of the target object.In addition,the correlation between spatial information and global context information cannot be fully utilized in the process of image feature extraction,which makes the recognition accuracy rate low.This paper proposes an end-to-end weakly supervised object detection based on bilinear attention feature fusion for how to obtain high-quality candidate regions and extract high-level semantic information of images.The specific research content is as follows:First of all,this paper proposes a method to obtain high-quality candidate regions.Combining the high-response characteristics of the Grad-CAM algorithm for category information,set 10 segmentation thresholds for the activation mapping of each specific category,which are evenly distributed between the maximum gray value of the activation map and the average gray value of all pixels.Use the maximum connected area method to obtain a set of bounding boxes as object proposal through the obtained threshold,and utilize these proposals to filter the candidate regions that are irrelevant or contain a small number of object in the selective search,which greatly improve the detection rate.Secondly,this paper proposes a bilinear attention feature fusion model,which combines the first-order attention global context module to obtain the correlation between global features,and the second-order bilinear pooling module extracts the local features of the image,and obtains two feature representations of the same dimension and merge them.The features of different levels and scales are effectively used to obtain the high-level information representation of the image,and this feature representation is sent to the object detection network for classification and localization.Finally,this paper combines an online instance classification refinement,the obtained candidate regions and high-level features represent the proposal features obtained by RIO Pooling,and the extracted proposal feature vectors are input into the deep detection network to perform multi-stage instance branching.Refine the multi-instance learning head.Experiments have proved that the method in this paper only detects high-quality regional box,the detection speed is significantly improved,and the regional box containing the object is also more complete;combined with the bilinear attention feature fusion module,it can not only retain the image feature information,but also capture the correlation between the image location context information,reduce the loss of important features in the feature extraction process,and greatly improve the accuracy of object classification.On the PASCAL VOC2007 data set,the method in this paper has achieved excellent performance in weakly supervised object detection,with average precision(m AP)and localization precision(Cor Loc)reaching accuracy rates of 51.0% and 70.1%,respectively.
Keywords/Search Tags:object detection, weakly supervised learning, multi-instance learning, bilinear attention
PDF Full Text Request
Related items