Font Size: a A A

Research On Outlier Detection And Its Parameter Optimization Algorithm

Posted on:2021-04-05Degree:DoctorType:Dissertation
Country:ChinaCandidate:J NingFull Text:PDF
GTID:1368330647960767Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Outlier detection is an important technology in data mining,from national security to personal health,from network intrusion detection to medical disease diagnosis.As long as the target is ”unusual” data,outlier detection techniques can be applied instead of manual methods.Although many achievements have been made in these fields,there are still some problems such as parameter dependence and low detection accuracy.To address these issues,this thesis explores the parameter optimization of outlier detection technology,multi-scene algorithm detection performance improvement,and algorithm re-sult evaluation metrics.The main contributions of this thesis are listed as follows:(1)For the optimization problem of parameter k(neighborhood size),this thesis presents a mutual neighbor graph(MNG)based parameter k searching algorithm.The algorithm defines a method for describing the stable state of MNG and selects the param-eter k of the proximity algorithm by searching the stable state of MNG.The experimental results show that this algorithm achieves better results on AUC detection metrics than other parameter k selection algorithms.(2)For the scenario with many kinds outliers,complex patterns,and lack of labels,this thesis proposes a novel outlier detection method,called Active Autoencoder(AAE).This method uses an influence-based active learning in combination with a new expansion-shrinkage operator to improve the detection capability of the self-coding network in outlier sparse scenarios.The experimental results show that the proposed method is able to detect inconsistencies in the image dataset more accurately than other methods.(3)For the problem that density-based outlier detection methods are difficult to iden-tify outlier in low-density patterns,this thesis proposes an outlier detection algorithm based on relative density.This algorithm proposes a novel method to measure the neigh-borhood density of data points,which does not limit the neighborhood size.Experimental results show that the algorithm can more accurately detect outliers in low-density patterns.(4)In terms of outlier detection on multiple time series,two new outlier detection models are proposed: a)To address the problem of high manual dependence and frequent of model Aggregation and Disaggregation(AD)trigger mechanisms,this thesis proposes an focus-area multi-element temporal outlier detection algorithm.Firstly,the algorithm divided the focus-area based on attention neighbors.Secondly,the outlier score of focus-area was calculated by k-distance outlier score of the entities in focusarea.Finally,a trig-ger mechanism for AD was constructed based on strongest-focus-area threshold decision method.The experimental results show that the proposed algorithm can not only judge the trigger time of the AD operation in time,but also enable the simulation system to in-telligently detect the simulation entities with sudden situation and meet the requirements of multi-resolution modeling.b)To address the problem of multi-object spatiotempo-ral anomaly detection,this thesis proposes an Long Short-Term Memory(LSTM)net-work based framework.The framework uses LSTM to compute reconstruction errors and anomaly scores based on Display Constraint Graph to determine anomalous subsequences and anomalous objects.Experimental results show that the method achieves a higher ac-curacy of anomaly detection than traditional methods.(5)For the problem of singleness and poor adaptability of the existing evaluation metrics of outlier detection,this thesis proposes two new evaluation metrics: the first type of high true rate metrics(HT?AUC)and the second type of low false positive rate metrics(LF?AUC).Based on the existing method of area under the curve,the proposed metrics are improved based on the requirements of high true rate and low false positive rate,respectively.The experimental results show that the proposed method can provide more appropriate metrics for evaluating the effectiveness and quantitative integration of outlier detection algorithms.
Keywords/Search Tags:outlier detection, parameter k, autoencoder, aggregation and disaggregation, evaluation metrics
PDF Full Text Request
Related items