Font Size: a A A

Novel techniques for data mining and pattern recognition

Posted on:2005-05-30Degree:Ph.DType:Dissertation
University:University of DelawareCandidate:Woody, Nathaniel AFull Text:PDF
GTID:1458390008488306Subject:Chemistry
Abstract/Summary:
Pattern recognition can be defined as attempting to quantify perceived similarities between objects. These objects can include such things as molecules, proteins, mixtures containing some common components, images or pixels in images. The similarities between these objects are quanitified with measurements ranging from spectroscopy to physical measurements to qualitative measures of taste or sight. Pattern recognition systems must then combine all of these measurements into a coherent predictive model that can analyze new samples and determine the class to which they belong.; Chemical data have several troubling aspects that make pattern recognition particularly difficult. These problems include vastly underdetermined data, low signal-to-noise (or more commonly, low information-to-variance) ratios, and complications arising from complicated mixtures of components. These problems tend to confound the direct application of statistical pattern recognition methods to chemical datasets.; In this dissertation, the problem of pattern recognition of chemical data is explored in two ways. First, orthogonal transformations are explored as a way to reduce the dimensionality problem and simultaneously increase the information-to-variance ratio. PCR and PLS are compared as alternative dimensionality reduction procedures, followed by the Wavelet Transform. The Wavelet Transform is demonstrated to be a method that can greatly increase the information to variance ratio, but without reducing dimensionality.; The second approach explored here is the application of probabilistic networks to chemical data. Probabilistic systems are advantageous because of their inherent ability to model uncertainty and are generally robust to noise. These systems are combined with variable and subspace selection in order to reduce the dimensionality or complexity of the pattern recognition problem. Probabilistic Bayesian networks are used to model continuous variables using discretization and the novel application of PLS. The application of Bayesian networks is shown to be advantageous in problems with missing data and for multivariate chemical situations.
Keywords/Search Tags:Pattern recognition, Data, Chemical, Application
Related items