Font Size: a A A

Structure Driven For One-Class Classifiers Design And Extended Research

Posted on:2012-04-20Degree:DoctorType:Dissertation
Country:ChinaCandidate:A M FengFull Text:PDF
GTID:1118330362958240Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The key problem for the classifier design is how to improve the generalization capability, that is the power to predict the unseen samples according to the knowledge obtained from the training data. One of the best ways to improve this issue is acquiring the prior knowledge mostly from the data. As for one-class classifiers design, the absent or few of the abnormality reduced the information abundantly. Under this much harder situation, it is more imperative to mine the prior knowledge from the only existed normality. This thesis concentrated on mining the prior knowledge for one-class classifiers design including local density information, structure information, cluster distribution and implicit information embodied in the few abnormality etc., the main contributions are listed as follows:(1) Review the key methods about one-class classifiers with the viewpoint of the density estimation and supporting domain, then proposed a hybrid model combined by these two methods. Summarized the main algorithms of one-class classifiers based on density estimation and supporting domain point of view, further analyzed the improve/variant algorithms of supporting domain models divided by hyperplane and hypersphere according to pinpoint the relations among each algorithm. Proposed a hybrid model adapted to nonsymmetrical data through embedding the local density into the supporting domain method.(2) Proposed the Structure-Driven Learning (SDL) strategy, designed the corresponding algorithm named Structured One-Class Support Vector Machine (SOCSVM), and derived its error bound. Perceived the drawbacks of the present one-class classifiers which merely emphasize on one side of local or global leaning but neglect the other, proposed the integrated learning strategy called Structure-Driven Learning. Through embedding the structure distribution information into the prototype of One-Class Support Vector Machine, a novel algorithm, SOCSVM, which satisfied SDL was proposed with more competence on data description and generalization capability. The uniqueness, resistance and the derived error bound for its optimal solution have been theoretically proved. As the foundation of the following research, the results of toy and UCI benchmark dataset illustrated the validity of SOCSVM and its leaning strategy.(3) Proposed a new Structure-Driven Learning model for multi-cluster data. Structure-Driven Learning strategy implemented by structure information embedded makes multi-cluster data's processing much different from the single ones. So, it is more reasonable to consider the structure of each cluster respectively than just simply treat all of them as a whole. Structure Large Margin One-Class Classifier (SLMOCC), an algorithm adapted to multi-cluster data, fulfills the above strategy by restricting each data's Mahalanobis distance to the hyperplane. Through maximizing the minimum Mahalanobis distance margin, SLMOCC can find a more reasonable optimal hyperplane attributed to its finer cluster granularity description. As for extracting the underlying data structure, SLMOCC uses the Ward's agglomerative hierarchical clustering on input data or mapping data in kernel space. The experiment results on toy and UCI benchmark data demonstrate the improved generalization capability of SLMOCC.(4) Developped a unified framework for one-class, binary-class and imbalance data problems. On the basis of the SDL model, an integrated framework named Biased Structure Data Description & Discrimination Machine (BSD3M) is developed either by further embedding the class margin into SOCSVM or also optimizing the threshold of the hyperplane with the target function ofν-SVM. Thus, the broken constraint of the equal positive and negative support vectors in classical SVM makes it possible to control the position of hyperplane on demanding. Through reasonable predefined the parameters of the object function and discrimination function, BSD3M not only can be used for one-class problem with few outliers to improve its description capability, but also can be extended to binary-class with almost balance data or imbalance problem with more important minority data for improving their discrimination ability. The primary experiment results of the majority normal data with 5 percent outliers indicate the more competitive boundary for BSD3M than the above one-class classifiers ascribe to the consideration of the outliers.(5) Generalized a series of linear programming algorithms for the above models with the comparative power with the advantage of efficiency computation. Through minimizing the average function distance from the target data to the hyperplane, a linear programming algorithm called SlpOCSVM can be obtained with the reduced computation complexity from O ( n 3)of SOCSVM's quadratic programming even to O ( n ). Further apply this idea to multi-cluster data and then replace the whole covariance matrix with summing up all cluster's covariance matrix, the simple version of SLMOCC sharply decreases the polynomial complexity of Second-order Cone Programming. Same embedded the class margin to SlpOCSVM also can get the linear programming of BSD3M. The experiment results of SlpOCSVM and simple version of SLMOCC showed that the validity of the Structure-Drive Learning and multi-cluster information embedding for non-margin linear programming algorithms.
Keywords/Search Tags:One-Class Classifier Design, Support Vector Machine, Structure-Driven Learning, Multi Cluster, Data Description & Discrimination, Quadratic/Second Order Core/Linear Programming
PDF Full Text Request
Related items