Font Size: a A A

Towards Deep Compact Visual Descriptor Via Fisher Network With Binary Embedding

Posted on:2020-06-23Degree:MasterType:Thesis
Country:ChinaCandidate:J Q QianFull Text:PDF
GTID:2428330575963612Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
In large-scale mobile visual search,the compactness of visual descriptor is of fundamental importance for retrieval efficiency.Fisher Vector(FV)is a type of very discriminative global descriptor and has achieved excellent performance for large scale visual search.But for resources limited devices such as mobile phone and embedded devices,the compactness of global descriptor is crucial.The high-dimension FV is not suitable for these devices.Hashing has been widely used to embed high-dimension global descriptor to low-dimension binary codes,but the low-dimension binary codes are not as discriminative as the original high-dimension global descriptor.To get compact visual descriptor,FV is first extracted followed by hashing encoding.The learning of hash codes based on the high-dimensional FV is a two-stage learning process:the learning of FV codebook and the learning of hashing encoding process,which makes the final binary codes sub-optimal.In recent years,more and more researchers focus on the end-to-end deep neural network which directly maps the image to binary codes.But this kind of binary codes is not very discriminative and is not optimal for large-scale visual search task..To solve these problems mentioned above,we propose a novel compact image description scheme based on an end-to-end deep neural network to solve large-scale image retrieval problem.The proposed neural network consists o:f two blocks:the Fisher network and the binary embedding neural network.The Fisher network is a learnable network that mimics the traditional FV encoding scheme,and can be trained jointly with other neural networks.The binary embedding neural network encodes the high dimensional FV produced by Fisher network into a middle-length binary codes.These two modules can be trained end-to-end,which makes the overall learning process optimal.The proposed network inputs the local feature descriptors of an image and outputs an image-level binary signature.The model is trained with the image label in a supervised manner.The output binary signature can preserve the semantic similarity between images and its length is as short as possible.Experiments performed on MPEG-7 CDVS and ILSVR2010 prove that the proposed compact image description scheme performs better than the traditional two-stage encoding method.
Keywords/Search Tags:Large-scale Mobile Visual Search, Compact Visual Descriptor, Aggregated Descriptor, Binary Coding
PDF Full Text Request
Related items