Font Size: a A A

Analysis methods for large batch and process data sets: Theory and applications

Posted on:2007-02-05Degree:Ph.DType:Dissertation
University:Arizona State UniversityCandidate:Cramer, Jeffrey AlanFull Text:PDF
GTID:1448390005961995Subject:Chemistry
Abstract/Summary:
There are many areas of chemistry in which large data sets are produced in distinct groupings (i.e. "batch" data) or large amounts of data need to be continuously evaluated (i.e. "process" data). Two novel techniques that can enhance both batch and process data analysis have been produced: a fractal-based analysis for outlier detection and a wavelet-based data compression that doubles as an accelerator for subsequent multiway data analyses.; Nonlinear spectroscopic data poses challenges to typical spectral analyses, not the least of which being in the automatic detection of spectra that deviate greatly from a predetermined norm. Such "outliers" should be easy to discriminate against during preprocessing using straightforward multivariate data processing tools, but data nonlinearity renders common outlier diagnostics (based on Mahalanobis distance or score distance tests) inappropriate. To compensate, an outlier detection technique based on the fractal dimension of data sets' score projections has been suggested and effectively employed. Outlying scores in the score space, while not necessarily deviating from overall score clusters in a conventional sense, will cause detectable fluctuations in the cluster's fractal dimension, thus providing a reliable identification trait.; The acquisition, processing, and archiving of large multidimensional data sets require generally undesirable amounts of data storage space and analysis time. To compensate, data compression by means of wavelet transforms has been proposed. Because wavelet transforms are linear (preserving underlying linear factors), chemometric results obtained in the wavelet domain from a compressed cube can be inversely transformed to derive approximated models in the original measurement domain. This technique is effective in increasing data storage capacity and accelerating multiway analysis.
Keywords/Search Tags:Data, Large, Batch, Process
Related items