Font Size: a A A

CLIFF: Finding Prototypes for Nearest Neighbor Algorithms with Application to Forensic Trace Evidence

Posted on:2011-10-12Degree:M.SType:Thesis
University:West Virginia UniversityCandidate:Peters, FayolaFull Text:PDF
GTID:2448390002956129Subject:Sociology
Abstract/Summary:
Prototype Learning Schemes (PLS) started appearing over 30 years ago (Hart 1968, [22]) in order to alleviate the drawbacks of nearest neighbor classifiers (NNC). These drawbacks include: (1) computation time, (2) storage requirements, (3) the effects of outliers on the classification results, (4) the negative effect of data sets with non-separable and/or overlapping classes, (5) and a low tolerance for noise.;To that end, all PLS have endeavored to create or select a good representation of training data which is a mere fraction of the size of the original training data. In most of the literature this fraction is approximately 10%. The aim of this work is to present solutions for these drawbacks of NNC. To accomplish this, the design, implementation and evaluation of CLIFF is described. The basic structure of the CLIFF algorithm involves a ranking measure which ranks the values of each attribute in a training set. The values with the highest ranks are the used as a rule or criteria to select instances/prototypes which obeys the rule/criteria. Intuitively these prototypes best represent the region or neighborhood it comes from and so are expected to eliminate the drawbacks of NNC particularly 3, 4 and 5 above.;With seven(7) standard data sets from the UCI repository [17], the outcome of this work demonstrate that for most cases, CLIFF is statistically the same as or better than those from 1NN rule clssifier as well as three other PLS. Finally in the forensic case study a data set composed of the infrared spectra of the clear coat layer of a range of cars, the performance analysis showed that it is strong with near 100% of the validation set finding the right target. Also, prototype learning is applied successfully with a reduction in brittleness while maintaining statistically indistinguishable results with validation sets.
Keywords/Search Tags:CLIFF, PLS, Drawbacks
Related items