| With the continuous increase of library collection,library resources are no longer confined to the form of text,and the proportion of multimedia resources such as pictures,audio and video in the library is increasing.With the subsequent diversification of data types and the rapid growth of data scale,the traditional retrieval methods cannot meet the needs of users.At present,there are a large number of multimedia resources in domestic libraries,and many libraries have digitized them and put them into the online system for users to use.However,they still use the keywordbased single-mode retrieval method,which makes the library lack of the ability to provide users with retrieval under the multi-mode scenario,which hinders the intelligent process of the library.In view of this situation,this paper explores the application of cross-modal retrieval technology based on deep learning in libraries,proposes a fast and accurate cross-modal hash retrieval model,realizes cross-modal library retrieval service,and provides a new retrieval idea for libraries.The main work of this paper is as follows:(1)The library multi-modal data set is constructed.In view of the lack of multimodal data set for libraries in the current research,this paper collected multi-theme pictures including local intangible cultural heritage,old shadows of the past,minority cultures,covers of children’s books,old photos,New Year pictures,etc.,through the National Museum of Digital Books.Through the process of data acquisition,data filtering,text description supplement,using Labelimg tool and manually annotating pictures,a multi-modal data set of digital library was established which could be used for cross-modal retrieval research.(2)A cross-modal hash retrieval model based on feature fusion is proposed.Aiming at the lack of semantic correlation expression of text and image features in the current cross-modal retrieval model,this paper proposes a cross-modal hash retrieval model based on feature fusion.Firstly,the deep learning network is used to extract the features of text and image,and then the two features are fused in the autoencoder to complement the semantic information and break the semantic gap between modes.In the learning process,the semantic similarity of the two modes is maximized.Finally,the unified hash code is used to further enhance the similarity of the hash codes of different modes.This retrieval model fully exploits the complementary role of text and image in semantic features,and experiments show that the algorithm has good performance.(3)A cross-modal hash retrieval model based on semantic segmentation network is proposed.Although convolutional neural networks can extract richer feature information of images,there may be noise information in these information,which will lead to an increase in retrieval time.To solve this problem,this paper proposes to apply semantic segmentation algorithm to image feature processing,convert images into feature maps through semantic segmentation networks,and then convert feature maps and text features into hash codes to calculate their similarity.The matching of text and text can reduce the noise interference and the time cost of retrieval.Meanwhile,in order to maintain more semantic similarity between image and text,the model calculates inter-modal loss and intra-modal loss at the same time,and can retain more semantic similarity through training.Experimental results show that the model can greatly reduce the retrieval time on the premise of ensuring the accuracy of retrieval.(4)Design and develop the digital library cross-modal retrieval system.In order to solve the problem of the lack of cross-modal search engine in library application,based on the two search models proposed in this paper,a library cross-modal text search system is designed which integrates user management,feature extraction and hash code generation,which can realize the two modes of fast search and accurate search. |