| With the development of the Internet and the popularization of multimedia obtaining devices like smart mobile phones,multimedia data is increasing explosively,image and video data become one of the main data of the current big data times.In order to bridge the semantic gap between the low-level characteristics and high-level semantics and mitigate the problem that the global image descriptor lacks geometric invariance and consumes much storage,we have studied several key technologies on image descriptor building,feature fusion and binarization in content-based image retrieval,making use of the knowledge of deep learning,feature encoding and hashing.The paper employs global and local image information and makes use of deep feature and hand-crafted feature simultaneously to build image descriptor in order to capture global and local,high-level and low-level information simultaneously.The discriminative ability and scale invariance of the descriptor is enhanced by fusing multiple level features and multi-scale information.Moreover,through compressing the real-valued descriptor into binary codes,the storage consumption is decreased without significant accuracy loss.Aiming at the problem that the features of the high-level layers of Convolutional Neural Network(CNN)contain rich high-level semantics but lack low-level information,a multi-level feature fusion(MFF)descriptor is proposed.MFF can capture the low-level color,edge characteristics and high-level semantics from Res Net simultaneously.MFF fuses color,SIFT,mid-level information from convolutional layers,high-level information from fully-connected layer and integrates multi-level features into a single descriptor.To further fuse the multi-level features of MFF,a neural network is trained as a nonlinear transformation,and MFF is transformed from a structured representation into a unstructured one.To save storage,a compressed version is proposed,a sign function is used to binarize the descriptor.Besides,two distance computation strategies are designedfor the compressed version to compute the dissimilarity between images: symmetric distance computation and asymmetric distance computation.Experiments show that the multi-level features are complementary and effective,the nonlinear transformation improves the accuracy of MFF effectively,the compressed version saves large amounts of storage without significantly decreasing the accuracy of the real-valued version.Aiming at the problem that the fully-connected layer features of convolutional neural network contain rich high-level information but lack local information and geometric invariance,a global-object-salient(GOS)image descriptor which contains multi-scale information and multi-level features is proposed.GOS is composed of three levels:global-level,object-level and salient-level.GOS integrates information from the whole image,the rectangular object regions and the salient region.The global level of GOS makes use of multi-scale information from the whole image by a multi-resolution strategy,the object-level of GOS makes use of object detection method to capture multi-scale information,the salient level of GOS makes use of salient detection net to capture information from the salient region.The scales and positions of the objects in an image are uncertain and the object is not necessarily in the center of the image,GOS enhances the geometric invariance by integrating the multi-scale information in global and object-level.The complementary property of the three components and the effectiveness of GOS are proved and incorporating the salient-level is effective,GOS achieves competitive performance on image retrieval task.In order to decrease the storage consumption of image descriptor,an iterative sparse hashing learning algorithm-multi-level semantic binary descriptor(MSBD)learning algorithm is proposed.The algorithm applies the sparsity constraint on the hash codes to decrease the redundancy in the real-valued descriptor used while minimizing the quantization error,besides,the algorithm preserves the discriminativeness while binarizing the descriptor which has multiple semantic levels.The orthogonal rotation is used to reduce the correlation between the descriptor dimensions to increase the information of thebinary codes.The algorithm alternately sparsifies the codes,updates the orthogonal matrix and codes.Besides,a dissimilarity metric is proposed,the dissimilarity metric integrates visual semantics of hash codes and high-level concept information of the class probability vector and improves the image retrieval accuracy effectively.Experiments on public image retrieval datasets show that MSBD is compact and discriminative,MSBD outperforms many state-of-the-art real-valued descriptors with relatively small space consumption.The paper has researched how to increase the discriminativeness by fusing multiple features,how to enhance scale-invariance as well as how to compress the real-valued descriptor without losing much information by iterative hashing learning.However,there are still some problems deserving to research.The paper focuses on descriptor disciminativeness and storage consumption,the speed of the real-valued descriptor building process is the shortcoming,which still needs to be further studied in the future.Some datasets have labels which are semantic words,how to make use of the image label and image visual information simultaneously to further improve the discriminative ability still needs to be studied.For image descriptor compression,MSBD is a two-stage method and is not optimal,how to combine MSBD with the end-to-end learning strategy to learn feature extractor and hash function simultaneously to achieve optimal discriminative binary codes still needs to be studied. |