Font Size: a A A

A Study On Automatic Image And Video Annotation

Posted on:2013-09-07Degree:MasterType:Thesis
Country:ChinaCandidate:L PengFull Text:PDF
GTID:2248330374483541Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
Due to the great success of text-based image retrieval, commercial search engines like Google and Baidu, have also introduced their own text-based image retrieval systems. Unfortunately, only a very small amount of the large-scale digital media on the Internet has text labels and related text descriptions, and meanwhile the multimedia resources are still explosively increasing. In order to better manage and make full use of these data, in recent years, the automatic annotation of images and videos has attracted many researchers’ focus. In the past decade, a variety of solutions based on the statistical model and classification model have been proposed, but the results are still unsatisfying.Multi-instance multi-label learning (MIML) is a new learning framework proposed recently and has been applied to image classification and annotation tasks due to its outstanding ability to express ambiguity object in images. In MIML, an example is described by multiple instances and associated with multiple class labels. Compared to traditional learning frameworks, the MIML framework is more convenient and natural for representing complicated objects which have multiple semantic meanings. Some studies show that in some tasks MIML is able to achieve better performance than traditional learning frameworks. Because the multi-instance representation of image is the key to solve the image ambiguity, the way to generate the multiple instances has important implications on annotation results. Two ways of multiple instances generation, region-based and grid-based are evaluated in this paper, and we adopt a new image segmentation algorithm for region-based multi-instance generation. The multi-instance representation of image increases the computational complexity of the similarity between images, so multi-instance kernel function is used in this paper, which can reduce the computational complexity and do not lose the ability to express ambiguity object. Video annotation as a nature extension of image annotation, many people resolve the video annotation problem by image annotation method directly. But the traditional methods do not consider the temporal dimension, which is an important aspect of video. In this paper, a temporally consistent weighted multi-instance kernel is developed to take into account both the temporal consistency in video data and the visual feature representation. In order to improve the generalization ability of the model, the thesis also proposes two ensemble learning algorithms for automatic annotation of images and videos.In this paper, we1) analyze the existing multi-instance image annotation methods, and experimentally evaluate two ways of multiple instances generation method;2) propose a new MIML based image annotation method which uses a multi-instance kernel function to evaluate image similarities;3) propose a temporally consistent weighted multi-instance kernel for video data and a new ensemble method based Ada-boost for video auto-annotation. All algorithms are evaluated on several benchmark datasets, including Corel5k, IAPR TC12for image and TRECVID2005for video. The experimental results show that our methods are effective in improving the accuracy of image and video annotation, and outperform several state-of-the-art methods.
Keywords/Search Tags:Image Auto-Annotation, Video Auto-Annotation, Temporal Consistency, Multi-Instance Kernel, Ensemble Learning
PDF Full Text Request
Related items