Font Size: a A A

Zero-Shot Image Classification Based On Multimodal Embedding

Posted on:2018-09-05Degree:MasterType:Thesis
Country:ChinaCandidate:Y Z XieFull Text:PDF
GTID:2348330542979592Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
For image classes to be predicted,traditional image classification techniques have to be provided with training sets containing the corresponding classes.However,due to the sharp increase of image data,it is sometimes difficult to provide labeled image training samples for each class of images.To address this issue,Zero-Shot Learning(ZSL)method is proposed to classify the classes that have not appeared in the training set.This thesis studies ZSL from two aspects: semantic feature space construction and distance measurement.Since ZSL needs to classify the categories that are not presented in the training set,additional side information is needed as a bridge to associate the seen categories with the unseen categories,so that the learned knowledge from the seen categories can be used in the unseen categories.Most of the existing methods use semantic features as side information,such as text feature corresponding to the class names,or attribute feature abstracted from the category objects.Then,in some approaches,visual features are mapped into the attribute space or text feature space.In other methods,visual features and semantic features are mapped into a common feature space.Finally,with the relationship between image features and semantic features learned from seen categories,the test samples can be predicted by associating with the semantic features corresponding to the unseen categories.Considering the fact that few existing approaches employ multiple semantic features together,this thesis proposes a Multi-Battery Factor Analysis(MBFA)method which is improved from the Inter-Battery Factor Analysis(IBFA)based on the idea of common feature space.It maps image features and semantic features of multiple modalities to one common feature space.Compared with the traditional semantic feature space construction method,this method can make full use of the complementary knowledge of different types of side information.Experimental results on three generic ZSL datasets demonstrate the effectiveness of the proposed MBFA-ZSL approach.In the semantic feature space,a reasonable distance measurement method can accurately reflect the relationship between visual features and semantic features,which will help to improve the performance of classification.Therefore,this thesis studies the distance measurement method of the visual features and semantic features in the semantic feature space.The existing ZSL method is usually measured with the traditional Euclidean distance,which assumes that all dimensions of the sample features are equally important,and it cannot effectively describe the relationship between the sample features.In this thesis,to better describe the distance between image features and semantic features,Distance Metric Learning(DML)is introduced to ZSL which is implemented with Canonical Correlation Analysis(CCA).The results show that DML can effectively describe the distance relationship of image features and semantic features in the common feature space,thereby improving the classification performance.Comparing with the state-of-the-art methods,we can see that the proposed CCA-DML algorithm can achieve better performance.
Keywords/Search Tags:Zero-Shot Learning, Image Classification, Multi-Battery Factor Analysis, Canonical Correlation Analysis, Distance Metric Learning
PDF Full Text Request
Related items