Cross-modal retrieval refers to the retrieval of data in different modalities,in other words,the retrieval of data from one modal and the data of another modal to ensure the similarity of the data of the two modals.Nowadays,the cross-modal hashing algorithm has been widely applied to Approximate Nearest Neighbor(ANN)for large-scale multi-modal retrieval.Among them,the supervised hash algorithm can improves the quality of hash coding by utilizing the semantic similarity of data pairs,and has recently received more and more attention.For most existing supervised hashing algorithms for cross-modal retrieval,the data always depends on the manually labeled feature vectors.With the development of machine learning,methods for extracting features through machine learning are widely used.Compared with the manually labeled feature vectors,the features obtained by machine learning are more comprehensive and accurate,and the extracted features are quantized into a binary code by a quantization method.However,the traditional quantization method produces a hash code that is not optimal,and there are many ways for improvement in network structure and quantization methods.Fuethermore,in view of the above problems and the shortcomings of existing methods,in this thesis,a cross-modal hashing method based on Convolutional Neural Network(CNN)is studied.In this thesis,the architecture and key technologies of CNN are first studied,and then a comprehensive survey of the natural language processing and hashing algorithms is given,which lays the foundation for the comprehensive analysis and research of cross-modal retrieval in CNN.By learning the traditional cross-modal retrieval method of deep learning,an optimized cross-modal retrieval model baseded on CNN is established.Aiming at some problems of the existing cross-modal methods,two different network structures are established for the data of the two modalities.That is an image network of multiple convolution pool layers is used to extract image features,and a Word2 Vec network is used to extract text features.In addition,a K-means-based Quantitative-optimization method for Deep Cross-modal Hashing(KQDH)is proposed,which classifies these feature points by the K-means algorithm.Through a new quantization method to control the quantization error and reduce the amount of calculation,the hash code can better represent the cross-modal features.Experiments show that this method can maintain similarity between cross-modal data and capture richer semantic information to the greatest extent,and complete cross-modal retrieval tasks more efficiently and accurately.Finally,combining the theoretical research results with the practice,a prototype system of crossmodal hashing method based on CNN is designed and implemented.In this prototype system,the proposed deep cross-modal retrieval model and the quantitative-optimization hashing methoed are adopted.The prototype system consists of four function modules including model management,feature extraction,data management and multi-mode retrieval,which can retrieve image by related text and vice versa.The test results show that the system can complete the cross-modal retrieval task quite well.The research results of this thesis can provide ideas for the research of convolutional neural networks in cross-modal retrieval neighborhoods,and can be used in practical applications of crossmodal retrieval as well,and thus has good theoretical value and application prospects. |