Extension of support vector machines for imprecise data using fuzzy set theory

Posted on:2009-10-28

Degree:Ph.D

Type:Dissertation

University:State University of New York at Binghamton

Candidate:Joshi, Atul Vamdev

Full Text:PDF

GTID:1448390005456333

Subject:Statistics

Abstract/Summary:

Support Vector Machines (SVMs) have been successfully applied to manufacturing, finance and healthcare for tasks including defect detection, fraud detection and medical diagnosis. Typically, SVMs require the training data to be represented as real valued vectors that consist of crisp numbers. However, in many real life applications, the training data comes with intrinsic uncertainty that might be the result of imprecise measuring instruments or human judgments/observations. To date, the problems of missing values and imprecise training data have not been explicitly addressed within the SVM methodology.;This research uses the geometrical framework to extend the classical SVM framework for imprecise data, where imprecision is handled by fuzzy numbers and imprecise data is processed using fuzzy arithmetic. The proposed research methods extend the reduced convex hull framework for fuzzy data where feature values of a classification object are represented by fuzzy numbers. The mathematical framework of fuzzy reduced convex hull and the algorithmic procedure to estimate the nearest points between two fuzzy convex hulls is presented, along with its associated proofs. Two geometric algorithms, Gilbert's algorithm and Schlesinger-Kozinec algorithm, are extended and implemented with constrained fuzzy arithmetic. It is shown that without explicit formation of fuzzy convex hulls, the fuzzy classifier could be determined using the proposed algorithms in quadratic time complexity. Furthermore, a new metric of model certainty is proposed to estimate the predictive uncertainty of the binary fuzzy classifier trained with the proposed algorithms.;Extensive experimentation with simulated datasets and a benchmark real life dataset empirically tested the convergence of both algorithms. Additionally, different fuzzification schemes including the forced fuzzification scheme and the opted fuzzification scheme are evaluated using these datasets. It is concluded that, irrespective of the imprecision in the data, the classification accuracy resulted from the binary fuzzy linear classifier is at par with the crisp linear classifier. Additionally, when the classification results are interpreted in terms of class membership degrees, ambiguity in the output could provide more insights into the classification task. The class membership values could be interpreted as the degrees by which the ambiguous data point belongs to both classes. This interpretation is unique to the methodology presented in this research. This insight is not possible, when crisp data is processed for training a classifier. The proposed algorithms will also be extremely useful in the case of incomplete training data where missing values could be represented by fuzzy numbers. Finally, the proposed approaches could be used to incorporate expert's knowledge in the training data by means of linguistic expressions, which could then be represented by fuzzy numbers and fuzzy feature vectors.

Keywords/Search Tags:

Fuzzy, Data, Using, Represented

Related items

1	Studies Of Fuzzy Description Logics Supporting Representation Of Fuzzy Data Types
2	Knowledge discovery from structured data represented by graphs
3	Design And Realization Of Fuzzy Decision Making Data Analysis Platform Based On Fuzzy Regression
4	Research Of XML Data Model Based On Fuzzy Data
5	Studies Of Construction And Storage Of Fuzzy OWL Ontologies Supported By Databases
6	Development Of Practical Software For Bad Data Identification In Power System
7	Research On Sample Data Based Fuzzy Rules Extraction Method And Its Application
8	Research Of Fuzzy Association Rules Algorithm Based On Data-driven FCM
9	Studies On Approaches Of Modeling And Querying Fuzzy Spatiotemporal Data Based On XML
10	Research On Motion Retargeting Method For Motion Data Represented By Joint Position