Font Size: a A A

Algorithm Improvements For Anomaly Detection And Their Applications

Posted on:2022-02-20Degree:MasterType:Thesis
Country:ChinaCandidate:Z C WuFull Text:PDF
GTID:2518306491477214Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
Anomaly detection is a method or process to find or identify outliers(values that are not consistent with normal patterns in data sets),which has wide practical background and great application values.For example,the discovery of abnormal driving vehicles plays an auxiliary role in the normal operation of the traffic system,the identification of abnormal stock transactions is conducive to the healthy development of the stock market,the abnormal detection of bridge cracks plays a key role in the safety of the project,and the identification of abnormal network intrusion ensures the safety and privacy of users.Due to its wide range of application values,anomaly detection has received more and more attention.At present,some common algorithms in anomaly detection,such as 2?,isolated forest,LOF,neural network reconstruction and so on,have achieved certain effects.However,there are still some problems such as high false positive rate and unsatisfactory timeliness,which are not conducive to the further promotion of the method.According to our research,this may be caused by the complex characteristics of the types of outliers in different industries.Based on the above background,this dissertation focuses on the following three aspects of anomaly detection from different perspectives,and improves the performances of corresponding algorithm.First,the second nearest neighbor anomaly detection algorithm based on automatic coding is proposed.In general,the algorithm may show a bad performance when the abnormal points are few but complex.This dissertation uses isolated forest algorithm to detect abnormal values of the dataset after extracting data features by automatic encoding method.the idea of integration is introduced to avoid the subjectiveness of parameters selecting.Based on the three data sets in DAMI database,this dissertation compares the performance on five aspects of the new algorithm with that of seven classical algorithms,such as LOF,Isolated Forest and Fast ABOD.The results show that the new algorithm can improve performance significantly and demonstrate the influence of abnormal values' types on algorithm effects.Second,one integration algorithm of anomaly detection based on clustering is presented.Aiming at the research blank of characteristics of abnormal value types,this dissertation uses clustering segmentation data,with the LOF algorithm under different parameters of each type of data,and the similarity matrix is established between abnormal samples is worth points,according to the degree of similarity matrix to select the most similar anomaly detector,and to save operating model for PKL file format,at the next prediction data,PKL direct call.The prediction results of the new sample are aggregated by the results of all class tests,and the weight of class aggregation is represented by the difference between each class and other classes.In addition,in order to further explore the type characteristics of outliers,the algorithm is applied to the real insurance data set to find four types of outliers and their corresponding characteristics,and the corresponding solutions are provided.Third,the time series anomaly detection and repair algorithm based on VMD spectral analysis is proposed.At present,many abnormal detection algorithms based on data reconstruction of time series have been proposed,which identify outliers by comparing reconstruction errors with pre-set thresholds.However,the existing problems are large amounts of calculations and error accumulations.In this dissertation,the features of several modes decomposed by VMD are extracted through the sliding window,and the eigenvalues and eigenvectors of the Laplace matrix under each sliding window are solved.Since the eigenvalues and eigenvectors represent some important information of the matrix,the change between the first and second eigenvalues is used to measure the degree of anomaly of the window.After identifying the sequence of the abnormal window,the GRU neural network embedded in the attention mechanism and local time series are combined to repair the outliers.Finally,the results show that all abnormal events can be identified by using the Real Traffic dataset in NAB-Master database.Traditional algorithms often default outliers to one type of data and perform poorly when dealing with complex data.In this dissertation,the types of outliers are considered in the algorithm design to solve the problem of high false positive rate.In addition,different parameter settings have a great impact on the results of anomaly detection algorithm.In order to solve this problem,the combination of integration and graph degree is used to screen parameters,which can overcome the subjective choice of parameters to some extent.On the basis of this idea,this dissertation innovatively explores the type characteristics of outliers and applies the algorithm to insurance claim data to obtain the features with realistic value,which provides theoretical basis for enterprise management and operation.For the detection of outliers in time series,the outliers are defined by the changes of eigenvalues and eigenvectors,and the sensitivity of detection algorithm is improved,which is helpful to alleviate the problem of high time consumption in time series anomaly detection.
Keywords/Search Tags:Anomaly detection, Time series, Feature extraction, VMD, GRU
PDF Full Text Request
Related items