Font Size: a A A

The Study Of Image Retrieval Based On Region Evaluation And Relation Modeling

Posted on:2020-12-27Degree:MasterType:Thesis
Country:ChinaCandidate:J WangFull Text:PDF
GTID:2428330599454711Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Content-based image retrieval(CBIR),commonly known as ‘searching similar images',has always been a fundamental research topic in the community of computer vision,which has far-reaching influence on related research fields and has a wide range of commercial applications.With the explosive growth of multimedia data on the network and the increasing demand for practical applications such as autonomous driving and augmented reality,image retrieval has become a fundamental and practical research topic.In recent years,deep learning methods and theories have achieved great success in artificial intelligence,representative research directions of pattern recognition such as object recognition,speech recognition,and object detection.The communnity of image retrieval also combines traditional encoding and aggregation methods with deep convolutional neural network(CNN)features to obtain a compact global feature representation.CNN features have the advantages of strong representation ability and low dimension,and are gradually widely used in industry and academia.The research work of this paper is mainly carried out in two types of image retrieval datasets: standard object retrieval dataset and visual place recognition datasets.These two kind of datasets can be regarded as object retrieval and multi-object retrieval datasets,respectively.Although the work of recent years has raised the retrieval accuracy of these data sets to a very high level,there are still some rooms for improvement in today's methods.This paper also further improves the accuracy of image retrieval by improving the existing methods.The research work of this paper mainly includes:(1)Effectively solve the problem of over counting.Objects in natural images have repetitive structures more or less,especially for buildings.Some previous works analyze that the repetitive structures cause over-counting of the features with repetitive structure,affecting the measure of similarity between images.With the spatial characteristic of CNN,we use pyramid pooling to aggregate the feature map of CNN to obtain regional features.The regional max pooling in pyramid pooling can effectively avoid the problem of over-counting of local structures.At the same time,in the experiment,we found that the PCA whitening commonly used in image retrieval excessively punishes the over-counting problem of global features.Therefore,we propose PCA power whitening,and solve the over-counting problem reasonably by setting the variance scaling factor.(2)Solving the influence of background and confusing objects through regional evaluation.The region of interest(ROI)in the image retrieval data set usually only occupies a part of the image,while the widely existing background and confusing regions affect the similarity measure of the image when retrieved.Facilitated by the attention mechanism popular in the NLP field,we propose two kinds of attention modules,which can evaluate each region feature and generate corresponding weights,adaptively assign large weights to the ROI,and reduce the weight of the background and confusing regions to reduce their contribution to image similarity.In the experiment,we found that the attention module can effectively improve the distinguishing ability of regional features and local features.(3)Image retrieval with relation features.In the past works,the image retrieval methods based on CNN global features are always based on the assumption that two images depict the same place if they contain enough similar objects,while the relation information is neglected.However,the relationships between objects are important cues for matching two images.Based on the regional features which show competitive performance in image retrieval task and inspired by the relationship modeling framework which is widely adopted in the visual relation detection task,we propose a regional relation module that models the relationship of regional features and generate the relationship features to form the relation feature maps.Compared with the traditional CNN feature map,the relationship feature map contains higher level of information including the relationship between the appearance of the object and the object,and the performance is usually better when combined with the commonly used aggregation method.At the same time,by analyzing the spatial characteristic of the relationship feature map,we further propose a cascade pooling method,which effectively improves the retrieval accuracy.
Keywords/Search Tags:Image Retrieval, Visual Recognition, Region Feature, Attention Mechanism, Visual Relation Reasoning
PDF Full Text Request
Related items