| In the era of artificial intelligence,deep learning based methods represented by convolutional neural network have shined in many computer vision tasks.For category image retrieval task,the image represented by convolutional neural network performs well in image retrivel,but it has failed so far to bring similar improvements to instance-level image retrivel.Instancelevel image retrivel has more stringent techinal requirements and also has wider application prospects.We argue that reasons for the underwhelming results of deep method on instancelevel image retrivel are three fold: 1)Most of the deep retrieval methods use network as local feature extractors.Leveraging models pretrained on large image classification datasets like Image Net.However,the Image Net is designed for distinguishing between different semantic catergories.Hence it supposedly robusts to intraclass variablity,here we are interested in ditinguishing between particular objects,even if they belong to the same semantic class.2)Inappropriate deep architecture and training procedure.3)Insufficient using the learning ability of convolutional neural network models.Many other methods use the CNN only as a local feature extractor and they still need to further calculate the parameters manually.Besides,the end-to-end network model is not implemented by these methods.4)Suboptimal activation function.Most of the CNN models for instance-level image retrivel use existing activation functions such as sigmoid,Re LU and PRe LU.However,these activation functions have many problems,and the space for optimization is considerable large.The purpose of this thesis is to improve the accuracy of instance-level image retrivel based on convolutional neural networks by addressing all four issues above.The main work of this article includes the following three points.1)For the training procedure,we use a siamese network that combines three streams with a triplet loss and that explicitly optimizes the weights of our network to produce representations well suited for a instance-level retrieval task.The Res Net-50 based network model proposed in this work is pretrained on Image Net,and then fine-tuning on the dataset specifically selected for instance-level image retrivel.2)For the representation,we build on the regional maximum activations of convolutions(RMAC)descriptor which is well suited for instance-level image retrieval.We note that all the steps of the R-MAC pipeline can be integrated in a single CNN and we propose to learn its weights in an end-to-end manner,as all the steps involved in its computation are differentiable.3)In this wrok,we proposed a new activation function named TRe LU.It retains all the advantages of Re LU,it also vanishes the dead neurons that Re LU would appear during training procedure,and alleviates the problem of non-zero mean of output.We also proposed an optimization scheme for the improvement of TRe LU in terms of computational efficiency.The end-to-end model of instance-level image retrieval based on convolutional nueral network implemented in this article contains all these work above.At the end of the training process,the proposed architecture produces a global image representation in a single forward pass that is well suited for image retrieval.Extensive experiments on datasets(Oxford5k,Paris6 k,and Holidays,etc.)show that our approach significantly outperforms previous retrieval approaches,including state-of-art methods based on costly local descriptor indexing and spatial verification. |