Font Size: a A A

Video Annotation Based On Transfer Learning

Posted on:2015-11-15Degree:DoctorType:Dissertation
Country:ChinaCandidate:H WangFull Text:PDF
GTID:1228330422493440Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Video annotation is a highly active research in the domain of information retrieval andpattern recognition, and has a multitude of applications, such as in surveillance, videounderstanding, etc. With the rapid development in internet-multimedia techniques andpersonal hand-hold equipment in recent years, video annotation becomes a more importantpart of future information retrieval.Traditional content based video annotation mainly relies on labelling a large amount ofvideos as training data, and then builds models to predict the unknown video content. Toobtain good classifiers, these approaches require a sufficient number of labeled trainingvideos. It is time consuming and labor expensive to collect enough labeled videos coveringa diverse set of conditions. Moreover, traditional approach assumes that the training andtesting data should be from the same domain (i.e., the same data distribution and samefeature spaces). Different from traditional machine learning, transfer learning allows thatthe training and testing data can be from different domains. This characteristic helps us tolearn well generalized classifiers for the domain of interest where have only a limited oreven no labeled training data, by utilizing many existing data from other related sources.The works in this thesis are concentrated on solving the problems of tranferringknowledge from Web images (source domain) to consumer videos (target domain), basedon which a number of transfer learning are developed and applied for annotating events inconsumer videos. Specifically, motivated by increasingly mature Web image search engines,we firstly propose to obtain event-related images by querying keywords to the searchengine, and develop transfer learning framework on heterogeneous data of image and video.In order to obtain more complete knowledge and avoid the bias of single keyword, multipleassociational keywords are applied to obtain multiple groups of Web images. In addition,based on multiple group transfer learning we propose to use concept-related keywords forknowledge querying, and multiple semantic groups are used to diminish semantic gapbetween low-level image feature and high-level event concept. Finally, we investigate thetechnologies related to internet big data and apply incremental transfer learning to adaptclassifiers to rapid changing multimedia data. The contributions of the thesis can besummarized from the following perspectives:We firstly propose a transfer learning framework using keywords querying knowledgefrom the Web image search engine, and transferring knowledge from images to videos by building a common feature space between the image space and video space. Under thisframework, a Cross-Domain Structural Model is built to jointly learning relations betweenimages and videos, as well as relations among different image attributes. By usingcanonical correlation analysis, heterogeneous features in two domains can be learned in aunified framework.To avoid bias caused by single query keyword, we further introduce multipleassociational keywords based knowledge transfer framework. Based on multiple imagegroups returned by event-associational keywords, a Joint Group Weighting Learningframework is developed to adapt classifiers learned from different but related image groupsto consumer videos. Under this framework, we propose a Discriminative TopologyPreserving Canonical Correlation Analysis to learn a new common feature space withdiscrimination information.To further adapt the diversity of semantic meaning of the consumer videos, wepropose Multi-Group based Domain Adaptation to leverage different groups of knowledge(source domain) queried from the Web image search engine to consumer videos (targetdomain). Different from traditional methods using multiple source domains of images, ourmethod organizes the Web images according to their intrinsic semantic relationships insteadof their sources. Specifically, two different types of groups (i.e., event-specific groups andconcept-specific groups) are exploited to respectively describe the event-level andconcept-level semantic meanings of target-domain videos. In order to make the groupweights and group classifiers mutually beneficial and reciprocal, a joint optimizationalgorithm is presented for simultaneously learning the weights and classifiers, using twonovel data-dependent regularizers.The fast emerging Web images and consumer videos make the previously learnedmodel outdated and unsuitable for handling the increasing new data. We proposeincrementally learnig knowledge from the Web and balancing between transferring the newknowledge and preserving what has been learned from the source domain. In order tomeasure the relevance between the groups and individual target videos, the group weightsare treated as latent variables for each target domain video.
Keywords/Search Tags:video retrieval, video annotation, transfer learning, domain adaptation, incremental transfer learning
PDF Full Text Request
Related items