Font Size: a A A

Disparate information fusion in the dissimilarity framework

Posted on:2011-05-31Degree:Ph.DType:Dissertation
University:The Johns Hopkins UniversityCandidate:Ma, ZhiliangFull Text:PDF
GTID:1448390002957231Subject:Applied Mathematics
Abstract/Summary:
We study the problem of combining multiple disparate types of data to improve the performances in various inferential tasks, and we propose the dissimilarity framework, which contains two steps: (1) calculate one or more dissimilarity matrices for each data source; and (2) combine all the dissimilarity matrices for the inferential purpose. In the first step, we take advantage of the knowledge of experts in each area, and unify disparate types of data into the dissimilarity space. In this dissertation, we focus on developing methods for combining multiple dissimilarity matrices.;One of the most widely used approach for using dissimilarity data involves converting the dissimilarity matrix into a configuration of points (called the embedding) through multidimensional scaling, and then building statistical models based on the embedding. To use later collected observations, called the out-of-sample data, one could re-do the embedding and modeling process, but it is not efficient. We study the alternative of out-of-sample embedding, and develop the out-of-sample embedding approach, OOSIM, to insert the out-of-sample objects into the existing embedding by minimizing sum of squared differences between dissimilarities and the corresponding Euclidean distances. Iterative majorization is used to minimize the criterion function. The simulation experiment suggests that OOSIM is a natural extension to de Leeuw's multidimensional scaling procedure, SMACOF, which minimizes the raw stress.;We develop the J-function approach to combine multiple dissimilarity matrices in the space of the Cartesian product of the embeddings. Due to the high dimensionality of this space, we introduce a novel supervised dimensionality reduction method. The simulation and real data results show that our approach can improve classification accuracy compared to the alternatives of principal components analysis and no dimensionality reduction at all.;We also consider information fusion from a different perspective. Suppose that objects are measured under multiple conditions---e.g., indoor lighting versus outdoor lighting for face recognition, multiple language translation for document matching, etc.---the challenging task is to perform data fusion and utilize all the available information for inferential purposes. We consider two exploitation tasks: (1) how to determine whether a set of feature vectors represent a single object measured under different conditions; and (2) how to create a classifier based on training data collected under one condition in order to classify objects measured in other conditions. The key to both problems is to transform all sets of feature vectors into one commensurate space, where the (transformed) feature vectors are comparable and would be treated as if they were collected under the same condition. Toward this end, we study Procrustes analysis and develop a new approach. We illustrate our methodology on English and French documents collected from Wikipedia, demonstrating superior performance compared to that obtained via standard Procrustes transformation.;We introduce a way to generate a collection of 3D shapes of different groups, and study the problem of combining multiple dissimilarity matrices derived from the same set of shapes for classification purpose. Experiment results show that different dissimilarity measures may capture different aspects of information and consequently combining all the dissimilarity matrices in an optimal way results in a higher classification accuracy than using each single dissimilarity matrix alone.
Keywords/Search Tags:Dissimilarity, Disparate, Data, Combining multiple, Information, Fusion
Related items