Font Size: a A A

Research On Cross-modal Retrieval Technology Based On Hashing Learning

Posted on:2020-01-04Degree:MasterType:Thesis
Country:ChinaCandidate:Y F LiFull Text:PDF
GTID:2428330590473926Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of artificial intelligence technology,it has become more and more mature.Many high-tech companies based on artificial intelligence technology have sprung up rapidly.Even many enterprises have produced artificial intelligence products that can change our daily life.Artificial intelligence technology can achieve such a remarkable achievement,but it is not achieved overnight.Since the birth of AI technology in 1956,it has experienced many outbreaks and cold winters,and this outbreak of AI technology is more violent,because compared with previous outbreaks,it has a distinct feature-based on big data.Big data is not only because of its large amount of data,but also because of its diversity of data types and low value density.In recent years,multimedia data such as text,image,audio and video on the Internet have shown explosive growth.We generate and receive various kinds of information every day,and these information will be recorded,and then through a variety of artificial intelligence technology to analyze our daily behavior,life habits,in order to provide a variety of convenient services for our lives.Among the massive multimedia data,some data are dependent of each other.They are likely to represent the same thing in different ways and have certain similarities in semantics.How to retrieve the data from these different types of data has gradually become an urgent need of people,and has been widely concerned by the academic community,which is cross-modal retrieval.The fundamental purpose of cross-modal retrieval is to find samples with similarity between different modes.It is a method to retrieve data with semantic similarity from a database using data of one mode as input.The hash method,which can effectively reduce the cost of data storage and speed up the retrieval process,has gradually become a common method to solve the cross-modal retrieval problem.However,the existing hash-based cross-modal retrieval methods generally do not make good use of labeled data.At the same time,there is a problem of unbalanced positive and negative examples in the existing data,which has a certain impact on the retrieval effect.In order to solve the above problems,this paper proposes a self-supervised multimodal joint hashing method,which can model the semantic relevance of data by extracting features from the class label data,so that under the supervision of semantic features,the feature distribution of image and text data can converge and help different modes of hashing learning to learn the semantic similarity between similar modal data more accurately;at the same time,add the semantic retention module to optimize the classification loss,so that the generated hash code can retain the semantic similarity as much as possible;design the loss function of adaptive weight,according to the positive and negative of the training samples of each input neural network.The proportion of samples can be adjusted flexibly to punish the classification errors of positive and negative samples.The binary constrained regular term function is used to minimize the errors and ensure that the generated approximate hash codes are close to +1 or-1 to improve the retrieval accuracy.Finally,in order to verify the actual effect of the algorithm,we compare it with some popular cross-modal retrieval algorithms on several open datasets.The experimental results show that the algorithm can effectively improve the accuracy of cross-modal retrieval.
Keywords/Search Tags:multimedia, learning to hash, cross-modal retrieval, self-supervised learning
PDF Full Text Request
Related items