Font Size: a A A

Design And Implementation Of A Cross-modal Retrieval System Based On Deep Adversarial Hashing Technology

Posted on:2022-02-06Degree:MasterType:Thesis
Country:ChinaCandidate:P ZhangFull Text:PDF
GTID:2518306602965619Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
With the continuous development of information technology and the demand for leisure and entertainment,the we-media industry has developed rapidly.The multimedia data on the Internet is increasing day by day.The increase of the amount and various forms of data have caused users having difficulties in data retrieval.Users gradually have higher requirements for data retrieval.The user hopes to retrieve data of other modals under the same semantics by inputting data of one modal.For example,if the user inputs a picture,it can return data of the same semantics such as text,video,and audio.Therefore,it is indispensable to research the technology of cross-modal retrieval.Cross-modal retrieval can integrate multiple modalities such as text,pictures,video,audio,etc.,and break the underlying difference gap between different modalities by using different complementary information.Although a lot of related research has been done on cross-modal retrieval,there are still many problems.For instance,current cross-modal retrieval algorithms are mostly based on two modalities(graphics and text),which cannot meet the needs of users.Multi-modal data with very different underlying structures have general relevance,resulting in low retrieval accuracy.Extracting feature vectors from the underlying structure of multimodal data,the vector dimension is too long,resulting in average retrieval speed.In response to the above problems,this article has made the following improvements:(1)First,expand the number of modalities,and expand the current common two-modal retrieval into four modal retrievals: picture,text,video,and audio.(2)In order to improve the accuracy of cross-modal retrieval,deep learning is used to extract features from modal data,and different networks are designed for different modalities.The image part uses VGGNet or ResNet,and the text part uses Word2 vec and CNN or one-hot and full.Connect the network,the video part uses the C3 D network,the audio part extracts the Log-Mel Spectrogram feature spectrogram and CNN,and the extracted data feature vectors are added to the adversarial network for high-level semantic association.Based on the adversarial network,the feature distribution between different modalities under the same semantics tends to be consistent,and the association between modalities is strengthened.(3)In order to improve the efficiency of retrieval,the high-dimensional feature vector is hashed and being converted into a low-dimensional binary hash code.This paper designs a cross-modal retrieval system based on deep confrontation hashing technology.The system development mainly uses Django and Vue.js technology.The system is divided into three modules: cross-modal retrieval module,system management module and data management module.(1)Cross-modal retrieval module provides users with retrieval functions.The user inputs modal data to the system,and the system analyzes the modal of the incoming data and extracts the data feature vector.The feature vector is hashed to generate a low-dimensional binary vector,and return the same semantics modal data as the user inputted by using similarity comparison(2)The system management module provides personal information management functions for ordinary users and administrators.Administrators can also maintain and manage ordinary users through this module.(3)The data management module provides the administrator with the function of maintaining the database.The administrator can expand the user search range and improve the search accuracy by adding and deleting the database.
Keywords/Search Tags:Deep learning, Hash conversion, Adversarial network, Cross-modal
PDF Full Text Request
Related items