Font Size: a A A

Video Annotation With Multiple Feature Distance Learning

Posted on:2013-12-02Degree:MasterType:Thesis
Country:ChinaCandidate:Z C LiFull Text:PDF
GTID:2248330395451102Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Since the invention of computers humans have been trying to help machines understand how we perceive the world. Over the years the boundary of machine understanding has been pushed forward beginning from numerical and symbolic computation. Text, images and videos encode such high level semantics that it is of extra significance if machines could mark these data with semantic labels, which in turn would be the very first step of machine understanding.Due to the currently unavoidable semantic gap, rather than boldly trying to achieve a general labeling system, semantic labeling usually confines itself within a certain defined circumstance. Labeling video data with a defined text label set makes a well-defined and solvable problem. Video data incorporates multi-modal features including audio, static images, text and motion features. Experiments show the effectiveness of motion feature in distinguishing videos of motion semantic nature. Traditionally video annotation systems do not dwell on the problem of effectively fusing multiple features to boost video classification results. We believe this is important because it helps to maximize discriminant power of individual feature. Further, this discriminant power varies given different class of videos.We based our model on kernelized logistic regression.With multiple features learned from video data, the multiple distance learning approach is employed to learn a weighed combination of features for each individual class. Norm1regularization serves the purpose of feature selection, which brings weight of non-relevant feature down to0. Norm2regularization on the logistic regression parameters helps to avoid over-fitting the dataset. Since the object function is not convex with respect to the two sets of parameters, an alternate optimization algorithm is used to minimize the object function.We tested our approach on the Columbian Consumer Video dataset. Experiments show that our approach substantially improved the video classification result and handles videos of motion semantic effortlessly. We conclude by pointing out that the Norm1regularization on the weight parameters does indeed prune non-relevant features.
Keywords/Search Tags:video retrieval, classification, motion, distancelearning
PDF Full Text Request
Related items