Font Size: a A A

Random graphs for structure discovery in high-dimensional data

Posted on:2006-04-19Degree:Ph.DType:Dissertation
University:University of MichiganCandidate:Costa, Jose Antonio OFull Text:PDF
GTID:1458390008958222Subject:Engineering
Abstract/Summary:
Originally motivated by computational considerations, we demonstrate how computational efficient and scalable graph constructions can be used to encode both statistical and spatial information and address the problems of dimension reduction and structure discovery in high-dimensional data, with provable results.; We discuss the asymptotic behavior of power weighted functionals of minimal Euclidean graphs, proving upper and lower bounds for the respective convergence rates and connecting them to the problem of nonparametric estimation of entropy.; We then extend the convergence results from Euclidean graphs to the setting of data that spans a high-dimensional space but which contain fundamental features that are concentrated on lower-dimensional subsets of this space---curves, surfaces or, more generally, lower-dimensional manifolds. In particular, we have developed a novel geometric probability approach to the problem of estimating intrinsic dimension and entropy of manifold data, based on asymptotic properties of graphs such as Minimal Spanning Trees or k-Nearest Neighbor graphs. Unlike previous solutions to this problem, we are able to prove statistical consistency of the obtained estimators for the wide class of Riemann submanifolds of an Euclidean space. We also propose a graph based dimensionality reduction method aimed at extracting lower dimensional features designed expressly to improve classification tasks, with applications to both supervised and semi-supervised learning problems.; Finally, using neighborhood graphs and the multidimensional scaling principle, we develop a general tool for dimensionality reduction in sensor networks, where communication constraints exist and distributed optimization is required. This tool is illustrated through an application to localization in sensor networks.
Keywords/Search Tags:Graphs, High-dimensional, Data
Related items