Font Size: a A A

Unsupervised Anomaly Detection Based On Sparse Autoencoder And Ensemble Learning

Posted on:2022-12-08Degree:MasterType:Thesis
Country:ChinaCandidate:X H LiuFull Text:PDF
GTID:2518306782977489Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
Unsupervised outlier detection refers to finding outliers that deviate from the normal distribution without data labels.Classical unsupervised outlier detection algorithms have great prospects in industrial applications.However,it may not be enough to use a single classical unsupervised outlier detection algorithm to detect all the anomalies in a certain dataset.In recent years,Locally Selective Combination in Parallel Outlier Ensembles(LSCP)has been proposed,which selects the most effective combination of unsupervised outlier detection algorithms for test points in different local areas,it improves the detection performance campared with a single algorithm.However,The LSCP integrated base detector combination may still out of work facing high dimensional data.LSCP defines the local areas of test points by using distance calculation method meanwhile.Therefore,using LSCP in highdimensional data,not only the detection performance will be greatly weakened,but also the test time cost will be very high.To solve these problems,it is the first time to propose that combining sparse autoencoder(SAE)and LSCP as a new outlier detection method(SAELSCP4)in this paper.SAELSCP4 is to use SAE latent space to represent data in low dimension,and then input the low dimension results into LSCP framework with heterogeneous base detectors:isolated forest(IForest)and one-class SVM(OCSVM)to detect the anomal.SAELSCP4 has no data label in the whole training process,and the training data contains outlier data.As a shallow network,SAE can well express local characteristics of data with its potential space and strengthen the detection performance of LSCP.The classic representative single unsupervised outlier detection method,LSCP method and SAE combined with single unsupervised anomaly detection method are selected as the baseline methods for comparison.In the four public outlier detection datasets selected in the experimental part of this paper,the AUC of SAELSCP4 is optimal in three of them,and achives up to 6.69%improvement in the Optdigits dataset especially.In the remaining one dataset,the AUC of SAELSCP4 reaches sub-optimal,with only 0.06%difference from the best baseline method.The proposed method greatly reduces the test time cost of the original LSCP in addition.Therefore,the comprehensive results of SAELSCP4 are better than other baseline methods.
Keywords/Search Tags:Unsupervised outlier detection, High-dimensional data, LSCP, Sparse autoencoder(SAE), Isolation forest(IForest), One-class SVM(OCSVM)
PDF Full Text Request
Related items