Font Size: a A A

Semantic Distance Metric Learning And Its Application In Multimedia Content Analysis

Posted on:2011-01-07Degree:MasterType:Thesis
Country:ChinaCandidate:B XiaoFull Text:PDF
GTID:2178360308452528Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
A good distance metric for the input data is crucial in many pattern recognition and machine learning applications, and is widely applied to image and video retrieval, biometrics, image automatic annotation, etc. However, it remains a great challenge to measure the similarity between data according to human perception and to reduce the gap between the low-level feature space and the high-level semantic space.This thesis addresses a number of key issues on defining and learning distance metric. The main focus is on the supervised metric learning and covariance distance between patches and its manifold explanation. The major innovations in this work are enumerated as follows:1. Based on supervised metric learning, a new semantic distance metric learning framework is proposed. Our guiding principle is to preserve the local neighborhoods based on specially designed distance as well as to maximize the distances between data that are not in the same neighborhood in the semantic space. Without any assumption about the structure and the distribution of the input data, this can be done by solving an optimization problem with relatively small amount of training data. Furthermore, the low-level feature space can be mapped to the high-level semantic space by a linear transformation. The proposed method can be used as a pre-processing step to help the common machine learning algorithms to find a better solution. This approach is not limited to problems of clustering and classification, but also is suitable to regression problems.2. Based on the covariance distance between image patches, a new action recognition approach has been developed on a basis of template matching. In the situation of real world applications, especially when there are only a few training samples, the relatively good result can be achieved. On the other hand, after the region of a certain key action in the video is determined, manifold curves rather than trajectories could be used to analyze videos, and to find the start time and end time of a certain event.Experimental results on the publicly available FG-NET database show that the learned metric correctly discover the semantic structure of data even when the amout of training data is small. Most importantly, simple regression methods such as k nearest neighbors (kNN), combined with our learned metric, become quite competitive (and some times superior) in terms of accuracy when compared with the state-of-the-art human age estimation approaches. The template matching techniques based on covariance distance is applied in the event detection contest of TRECVID2008, and won the best run in the task of"pointing".
Keywords/Search Tags:Distance Metric, Manifold Learning, Supervised Metric Learning, k Nearest Neighbor (kNN), Covariance Metric, Human Age Estimation, Event Detection
PDF Full Text Request
Related items