Font Size: a A A

Fast and accurate estimation for astrophysical problems in large databases

Posted on:2011-05-22Degree:Ph.DType:Thesis
University:Carnegie Mellon UniversityCandidate:Richards, Joseph WFull Text:PDF
GTID:2448390002454518Subject:Statistics
Abstract/Summary:
A recent flood of astronomical data has created much demand for sophisticated statistical and machine learning tools that can rapidly draw accurate inferences from large databases of high-dimensional data. In this Ph.D. thesis, methods for statistical inference in such databases will be proposed, studied, and applied to real data. I use methods for low-dimensional parametrization of complex, high-dimensional data that are based on the notion of preserving the connectivity of data points in the context of a Markov random walk over the data set. I show how this simple parameterization of data can be exploited to: define appropriate prototypes for use in complex mixture models, determine data-driven eigenfunctions for accurate nonparametric regression, and find a set of suitable features to use in a statistical classifier. In this thesis, methods for each of these tasks are built up from simple principles, compared to existing methods in the literature, and applied to data from astronomical all-sky surveys. I examine several important problems in astrophysics, such as estimation of star formation history parameters for galaxies, prediction of redshifts of galaxies using photometric data, and classification of different types of supernovae based on their photometric light curves. Fast methods for high-dimensional data analysis are crucial in each of these problems because they all involve the analysis of complicated high-dimensional data in large, all-sky surveys. Specifically, I estimate the star formation history parameters for the nearly 800,000 galaxies in the Sloan Digital Sky Survey (SDSS) Data Release 7 spectroscopic catalog, determine redshifts for over 300,000 galaxies in the SDSS photometric catalog, and estimate the types of 20,000 supernovae as part of the Supernova Photometric Classification Challenge. Accurate predictions and classifications are imperative in each of these examples because these estimates are utilized in broader inference problems involving the estimation of important cosmological parameters that describe the history, fate and nature of the Universe.
Keywords/Search Tags:Data, Estimation, Accurate, Large
Related items