Font Size: a A A

Outlier Detection By Using Synthetic Strategy Support Vector Machine

Posted on:2011-08-20Degree:MasterType:Thesis
Country:ChinaCandidate:M L LiuFull Text:PDF
GTID:2178330332461513Subject:Control theory and control engineering
Abstract/Summary:PDF Full Text Request
Increasing attention is being paid to data mining, which is an effective tool for discovering useful information and potential knowledge from mass data. Outlier detection technology aims to detect data objects that do not comply with the general behavior of the data in the datasets and find useful information. Because of adapting to the extensive applications such as crime detection in financial activities, medical diagnosis, image processing, computer network intrusion detection, it has become an important research area of data mining and made a great progress. However, with the expansion of the application fields, the existing methods can not meet the requirements and encounter problems in some aspects such as the generalization ability and robust stability of the models. So the more effective and stable methods should be presented for outlier detection. Moreover, in related applications of outlier detection, there are usually plenty of normal data and very few outliers which are difficult or expensive to obtain. So most of outlier detection methods only focus on using the normal data to establish a model or function and rarely using the labeled outliers. So the models could not reflect the actual situation in the classification.This paper does researches in outlier detection methods for the problem with large amount of normal data and few labeled outliers. And a new outlier detection method called synthetic strategy support vector machine is presented. It is based on support vector machine and kernel theories. The basic idea of the algorithm is stated as follows:the problem is firstly taken as a binary classification problem with imbalanced data, a hyperplane, which can separate the normal data from the outliers with maximum margin, is built in the feature space. As it usually costs too much to classify the outliers wrong, at the same time the hyperplane is placed as close as possible to the normal data, in order to improve the detection accuracy of outliers. The paper presents the detailed description of the algorithm from the aspects including mathematical model, dual problem, and solution.At last we carry out simulation experiments on three kinds of datasets which are six datasets about outlier detection problems, and use three evaluation metrics called g-means, true positive rate and false positive rate to evaluate the algorithm from different aspects. The experimental results show that, in comparison with existing methods, the algorithm not only promotes the performance of detecting outliers, but also can classify the normal data correctly.
Keywords/Search Tags:Outlier Detection, One-Class Classification, Support Vector Machine, Kernel Function
PDF Full Text Request
Related items