Font Size: a A A

Design And Implementation Of DCGAN-based Image-text Cross-modal Retrieval System

Posted on:2021-01-24Degree:MasterType:Thesis
Country:ChinaCandidate:T T GouFull Text:PDF
GTID:2428330605469269Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the maturity of today's network technology,Internet users have generated a large number of multimedia data with different modes(such as image,text,voice,video,etc.)while they are active in network social contact,which has promoted the development of multimedia related research.Cross-modal retrieval technology is one of the research hotspots.Cross-modal retrieval refers to the mutual retrieval of multi-modal data such as image,text and audio.On the basis of common subspace learning,this paper USES the limit learning machine method to improve the accuracy of cross-modal retrieval,dig deeper data features and maximize the correlation between different modes,so as to make the learned Shared subspace more discriminative.At the same time,cross-modal retrieval is realized through deep convolution generating antagonism network,and the correlation of different modal data is further mined by using unlabeled samples to improve the performance of cross-modal retrieval.The main work and innovations are as follows:1.A cross-modal retrieval method based on limit learning machine is proposed.This method calculates the distance between semantics by using the single-layer neural network to measure the similarity,then introduces the classification label,and USES the supervised learning method to improve the independent learning ability of the model,which has better generalization performance and can independently learn more discriminative shadow space.The experimental comparison in the open data set shows that the cross-retrieval accuracy of this method is improved.2.A cross-modal retrieval method based on deep convolution generated antagonistic network is proposed.This method is based on the deep convolution generating antagonism network,and fuses the deep canonical correlation analysis method,which effectively improves the utilization rate of different modal data relevance and unlabeled samples in the cross-modal retrieval process.The depth canonical correlation analysis constraint is added between the two monomodular representation layers of image and text,and the image and text feature projection model is constructed to excavate the semantic relevance of the sample pairs.On this basis,DCGAN was used as the basic framework of the whole model,and unlabeled samples were used for training.Meanwhile,the image and text feature projection model is used as the generator,and the convolutional neural network is used as the discriminator to establish the modal feature classifier.Finally,the common subspace representation of the sample is learned through the mutual confrontation between the two.Compared with other mainstream methods,the experimental results show that this method has better performance.Through the experiment and comparison of the existing cross-modal retrieval methods,the cross-modal retrieval method based on the deep convolution generated antagonism network proposed in this paper is used to design and implement the cross-modal retrieval system of text and text.The system has the function of mutual retrieval of image and text,which improves the accuracy of retrieval results and satisfies the needs of users for the diversification of information retrieval methods.
Keywords/Search Tags:Cross-modal retrieval, deep canonical correlation analysis, adversarial learning, Deep convolutional generative adversarial networks
PDF Full Text Request
Related items