Font Size: a A A

Text-aided Image Classification: Using Labeled Text From Web To Help Image Classification

Posted on:2011-04-12Degree:MasterType:Thesis
Country:ChinaCandidate:Y LinFull Text:PDF
GTID:2178360308452441Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
As more and more multimedia data become available on the Word Wide Web, mining on those data is palying an increasingly important role in web applications. Noticing that there exists large amount of labeled text data on the web, and considering that it is much easier to represent and mine knowledge from text data compared with from multimedia data, people want to investigate the interplay between multimedia data and text data, hoping that could help us understand multimedia data better. Thus, maximizing the benefit from text information becomes a very crucial problem in multimedia data mining area.In this paper, we address the image classification problem to seek a gate of mining across media data space and text data space. We solves the problem of image classification with very limited amount of labeled training images, in an approach we called text-aided image classifier (TAIC). This problem is important in practice, since currently on the web, labeled text data are usually much more than imgae data. To solve the problem, based on the bag-of-words view and the Naive Bayes (NB) classification model, we focus our attention on the estimation of image feature distribution of target concept, under the help of rebundant labeled text data and image-text co-occrrence data on the web.Specifically, we extend the traiditional NB algorithm by considering a mapping which we called"feature mapping"that maps into the image feature space the most discriminative text features we found in labeled text training data. This procedure is based on the abundant image-text co-occurrence data on the web, which acts like a bridge that connects text and image knowledge. The essence of our algorithm is to use a text feature distribution based on enough labeled text data to estimate the image feature distribution under the same target concept.Our emprirical results on real world data sets show that our method makes a good approximation of image feature distribution when trained with abundant labeled images. In case labeled images are very limited, the classification performance is greatly improved by using auxiliary labeled text data. Finaly, our mixed classification model which accepts both labeled images and text as training data achieves better classification performance under various sized training image sets, which shows that our method can indeed integrate text and image knowledge and improve the performance of image classification, in a simple yet effective way.
Keywords/Search Tags:Image classification, co-occurrence, feature mapping
PDF Full Text Request
Related items