Font Size: a A A

Semantic Classification And Retrieval For Cross-media Data

Posted on:2017-03-13Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y C WeiFull Text:PDF
GTID:1108330485961196Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
With the development of information techniques and social networks, cross-media data such as image, text, audio and video has been constantly changing the way of peo-ple’s living and working. Understanding the semantics of the cross-media data and ana-lyzing the relationship among them have attracted many researchers in the field of cross-media analysis and pattern recognition. Targeting at the cross-media data, this thesis engages in three key problems, i.e., semantic renforcement with cross-media data, cross-media retrieval and multiple attributes learning. The main contributions are in the follow-ing four folds.1. We propose a cross-media semantic enhancement framework to benefit the im-age retrieval task. The goal of the semantic enhancement framework is to find a mapping matrix by exploiting the correlation between visual features and textu-al features. Thus, the original noisy distribution of visual features can be refined by leveraging the discriminative distribution of the corresponding textural features. Experimental results well demonstrate the effectiveness of the proposed method.2. We propose a task-specific cross-media retrieval (TSCR) method. By jointly opti-mizing the correlation between image and text and the linear regression from one modal space (image or text) to the semantic space, two couples of mappings are learned to project images and text from their original feature spaces into two com-mon latent subspaces (one for image retrieve text and the other for text retrieve image). Experimental results well demonstrate the effectiveness of the proposed method.3. We propose a deep Semantic Matching (deep-SM) method to address the cross-media retrieval problem with respect to samples which are annotated with one or multiple labels. Within the deep-SM framework, two kinds of deep neural network are learned to map image and text into an isomorphic semantic space. In addi-tion, cross-media retrieval with Convolutional Neural Network (CNN) visual fea-tures is also implemented with several classic methods. Experimental results well demonstrate the effectiveness of the proposed method and CNN visual features for cross-media retrieval.4. We propose a flexible deep CNN infrastructure called Hypotheses-CNN-Pooling (HCP) for multiple attributes learning. In HCP, we first propose a hypotheses se-lection method to choose a small number of proper hypotheses for each multi-label image. Then, a shared CNN accompanied with a cross-hypothesis max-pooling operation is utilized to build a unified framework for multi-label classification. The proposed HCP is trained in an end-to-end manner and achieves the state-of-the-art performance on Pascal VOC 2007 and VOC 2012.
Keywords/Search Tags:image retrieval, cross-media retrieval, multi-label classification, subspace learning, deep learning
PDF Full Text Request
Related items