Font Size: a A A

Cross-modal Retrieval And Annotation Based On Hashing Learning Method

Posted on:2018-03-13Degree:DoctorType:Dissertation
Country:ChinaCandidate:J L WangFull Text:PDF
GTID:1368330566451332Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the advent of the era of big data,multimedia data on the Internet grows explosively.It is a new challenge for researchers to deal with the new requirements for managing and applying the data resources intelligently.Apart from the large volumn,the multimedia data often features in multi-modality,such as the image,text,video and audio.To satisfy the application requirement of these diverse and complex multi-modal data,an intelligent retrieval and annotation method is proposed based on hash techniques.Hash learning method is an important research direction in data management area.It leverages the machine learning techniques to map data to binary hash codes.Since the hash code is compact,it is an effective approach for dimension reduction and can greatly reduce the storage overhead.Additionally,it is very efficient and can save a lot of computational resouces,which is important for large-scaled multimedia data management.Furthermore,in current situation,traditional retrieval method based on a single modality(such as image features or keywords)is unable to satisfy the growing needs of users,a more intelligent retrieval and annotation method is required.Many methods have been proposed to solve this problem.However,there is still room for further improvement in the semantic association of heterogeneous data.Cross-modal retrieval method can retrieve data from different modalities.It uses the semantic relationship among different modalities to improve the efficiency of data retrieval and annotation,which has gain more and more attentions of researchers.Our cross-modal retrieval technique leverges the hash learning method,which ensures the scalability of our technique.The hybrid hash learning method based on multiple simple hash learning techniques is an effective approach.However,it is just a preliminary solution and has many limitations.In order to further improve the efficiency of cross-modal hash learning,more research efforts should be paid to the essence of hash learning,such as theoretic anlyasis,modeling,parameter optimization,etc.In this work,we we consider the semantic correlation among different modalities and integrate multi-graph learning,cross-modal semantic correlation learning and hash learning into a unified joint framework.Since each individual technique can complement each other,the joint framework is effective.Via optimizing the joint framework,we propose an optimized joint learning framework S3 FH.The optimized framework employs hash learning techniques.It can not only effectively map the semantic information through hash functions into a common Hamming space,but also reduce the noise in input training data.Consequently,the optimized framework is able to further enhance the semantic relationship among modalities.Comprehensive cross-modal retrieval experiments are performed on real-world datasets.Experimental results show that S3 FH outperforms state-of-the-art methods.The semantic data retrieval based on multi-modal data is effective.However,existing cross-modal retrieval methods focus on the accuracy of retrieval algorithms and less attention is paid to the efficiency,which is crucial for large-scaled data management.In this work,we propose a cross-modal retrieval approach based on hash learning techniques.Since hash techniques are well known for their time and space efficiency,our technique can deal with large-scaled data retrieval effectively.In this work,we study the retrieval method of BMSH,which considers both the image modality as well as text modality.BMSH first maps image features and textual features into a same hash code space.Then,a candidate ranking algorithm is employed to rank image candidates.Since our approach is able to preserve semantic relationships between the image modality as well as text modality,it is effective in retrieving the most semantic-related images.Most of all,our technique leverages hash methods,which makes it efficient for dealing with large-scaled multimedia data in the real world.The semantic annotation of multimedia data plays an important role in the intelligent data management.It is time-consuming and laborious to carry out massive image annotation manually.Automatic image annotation techniques require establishing correlations between high-level semantic information and low-level features.Based on the observation that,in practical applications,only a small number of labeled images can be used as training samples.In this paper,we use the hash method to enhance the semantic relation between the text modality and image modality.After that,we map both text features and image features into the same Hamming space through learned hash functions,which is called MMSHL.Based on MMSHL,we propose a two-step semi-supervised automatic image annotation method.In the first step,MMSHL is used to predict labels of unlabeled images by taking labelled and unlabeled images as traning data.In the second step,we annotate images based on the model learned in the first step.Experimental results show that the proposed two-step semi-supervised automatic image annotation algorithm based on MMSHL can effectively improve the performance of automatic image annotation.
Keywords/Search Tags:Hash Learning, Cross-modal Hashing, Multi-modal Fusion, Semantic Decomposition, Multi-Graphic Learning, Semi-supervised Learning, Automatic Image Annotation
PDF Full Text Request
Related items