Font Size: a A A

Optimized dictionary design and classification using the matching pursuits dissimilarity measure

Posted on:2010-09-30Degree:Ph.DType:Dissertation
University:University of FloridaCandidate:Mazhar, RaaziaFull Text:PDF
GTID:1448390002482521Subject:Computer Science
Abstract/Summary:
Discrimination-based classifiers differentiate between two classes by drawing a decision boundary between their data members in the feature domain. These classifiers are capable of correctly labeling the test data that belongs to the same distribution as the training data. However, since the decision boundary is meaningless beyond the training points, the class label of an outlier determined with respect to this extended decision boundary will be a random value. Therefore, discrimination-based classifiers lack a mechanism for outlier detection in the test data. To counter this problem, a prototype-based classifier may be used that assigns class label to a test point based on its similarity to the prototype of that class. If a test point is dissimilar to all class prototypes, it may be considered an outlier.;Prototype-based classifiers are usually clustering-based methods. Therefore, they require a dissimilarity criterion to cluster the training data and also to assign class labels to test data. Euclidean distance is a commonly used dissimilarity criterion. However, the Euclidean distance may not be able to give accurate shape-based comparisons of very high-dimensional signals. This can be problematic for some classification applications where high-dimensional signals are grouped into classes based on shape similarities. Therefore, a reliable shape-based dissimilarity measure is desirable.;In order to be able to build reliable prototype-based classifiers that can utilize shape-based information for classification, we have developed a matching pursuits dissimilarity measure (MPDM). The MPDM is capable of performing shape-based comparisons between very high-dimensional signals. The MPDM extends the matching pursuits (MP) algorithm [1] which is a well-known signal approximation method. The MPDM is a versatile measure as it can also be adopted for magnitude-based comparisons between signals, similar to the Euclidean distance.;The MPDM has been used with the competitive agglomeration fuzzy clustering algorithm (CA) [2] to develop a prototype-based probabilistic classifier, called CAMP. The CAMP algorithm is the first method of its kind as it builds a bridge between clustering and matching pursuits algorithms. The preliminary experimental results also demonstrate its superior performance over a neural network classifier and a prototype-based classifier using the Euclidean distance. The performance of CAMP has been tested on high-dimensional synthetic data and also on real landmines detection data.;The MPDM is also used to develop an automated dictionary learning algorithm for MP approximation of signals. This algorithm uses the MPDM and the CA clustering algorithm to learn the required number of dictionary elements during training. Under-utilized and replicated dictionary elements are gradually pruned to produce a compact dictionary, without compromising its approximation capabilities. The experimental results show that the size of the dictionary learned by our method is 60% smaller but with same approximation capabilities as the existing dictionary learning algorithms.
Keywords/Search Tags:Dictionary, Class, Matching pursuits, Data, Dissimilarity, Decision boundary, MPDM, Algorithm
Related items