Font Size: a A A

Extension of support vector machines for imprecise data using fuzzy set theory

Posted on:2009-10-28Degree:Ph.DType:Dissertation
University:State University of New York at BinghamtonCandidate:Joshi, Atul VamdevFull Text:PDF
GTID:1448390005456333Subject:Statistics
Abstract/Summary:
Support Vector Machines (SVMs) have been successfully applied to manufacturing, finance and healthcare for tasks including defect detection, fraud detection and medical diagnosis. Typically, SVMs require the training data to be represented as real valued vectors that consist of crisp numbers. However, in many real life applications, the training data comes with intrinsic uncertainty that might be the result of imprecise measuring instruments or human judgments/observations. To date, the problems of missing values and imprecise training data have not been explicitly addressed within the SVM methodology.;This research uses the geometrical framework to extend the classical SVM framework for imprecise data, where imprecision is handled by fuzzy numbers and imprecise data is processed using fuzzy arithmetic. The proposed research methods extend the reduced convex hull framework for fuzzy data where feature values of a classification object are represented by fuzzy numbers. The mathematical framework of fuzzy reduced convex hull and the algorithmic procedure to estimate the nearest points between two fuzzy convex hulls is presented, along with its associated proofs. Two geometric algorithms, Gilbert's algorithm and Schlesinger-Kozinec algorithm, are extended and implemented with constrained fuzzy arithmetic. It is shown that without explicit formation of fuzzy convex hulls, the fuzzy classifier could be determined using the proposed algorithms in quadratic time complexity. Furthermore, a new metric of model certainty is proposed to estimate the predictive uncertainty of the binary fuzzy classifier trained with the proposed algorithms.;Extensive experimentation with simulated datasets and a benchmark real life dataset empirically tested the convergence of both algorithms. Additionally, different fuzzification schemes including the forced fuzzification scheme and the opted fuzzification scheme are evaluated using these datasets. It is concluded that, irrespective of the imprecision in the data, the classification accuracy resulted from the binary fuzzy linear classifier is at par with the crisp linear classifier. Additionally, when the classification results are interpreted in terms of class membership degrees, ambiguity in the output could provide more insights into the classification task. The class membership values could be interpreted as the degrees by which the ambiguous data point belongs to both classes. This interpretation is unique to the methodology presented in this research. This insight is not possible, when crisp data is processed for training a classifier. The proposed algorithms will also be extremely useful in the case of incomplete training data where missing values could be represented by fuzzy numbers. Finally, the proposed approaches could be used to incorporate expert's knowledge in the training data by means of linguistic expressions, which could then be represented by fuzzy numbers and fuzzy feature vectors.
Keywords/Search Tags:Fuzzy, Data, Using, Represented
Related items