Font Size: a A A

Adaptive cluster detection

Posted on:2011-01-25Degree:Ph.DType:Dissertation
University:Carnegie Mellon UniversityCandidate:Friedenberg, David AaronFull Text:PDF
GTID:1448390002454741Subject:Statistics
Abstract/Summary:
The next generation of telescopes will acquire terabytes of image data on a nightly basis. Collectively, these large images will contain billions of interesting objects, which astronomers call sources. The astronomers' task is to construct a catalog detailing the coordinates and other properties of the sources. The source catalog is the primary data product for most telescopes and is an important input for testing new astrophysical theories, but to construct the catalog one must first detect the sources. Existing algorithms for catalog creation are effective at detecting sources, but do not have rigorous statistical error control. At the same time, there are several multiple testing procedures that provide rigorous error control, but they are not designed to detect sources that are aggregated over several pixels. We propose a family of techniques that do both, by providing rigorous statistical error control on the aggregate objects themselves rather than the pixels. We demonstrate the effectiveness of this approach on data from the Chandra X-ray Observatory Satellite. Our techniques effectively controls the rate of false sources, yet still detect almost all of the sources detected by procedures that do not have such rigorous error control and have the advantage of additional data in the form of follow up observations, which may not be available for upcoming large telescopes. In fact, we even detect two new sources that were missed by previous studies.;The statistical methods we develop can be extended to problems beyond Astronomy, as we will illustrate with examples from Neuroimaging. We examine a series of high-resolution function Magnetic Resonance Imaging (fMRI) experiments in which the goal is to detect bands of neural activity in response to visual stimuli presented to subjects in an fMRI scanner. We extend the methods developed for Astronomy problems so that we can detect two distinct types of activation regions in the brain with a probabilistic guarantee on the rate of falsely detected active regions.;Additionally we examine the more general field of clustering and develop a framework for clustering algorithms based around diffusion maps. Diffusion maps can be used to project high-dimensional data into a lower dimensional space while preserving much of the structure in the data. We demonstrate how diffusion maps can be used to solve clustering problems and examine the influence of tuning parameters on the results. We introduce two novel methods, the self-tuning diffusion map which replaces the global scaling parameter in the typical diffusion map framework with a local scaling parameter and an algorithm for automatically selecting tuning parameters based on a cross-validation style score called prediction strength. The methods are tested on several example datasets.
Keywords/Search Tags:Data, Detect, Error control, Sources, Methods
Related items