Font Size: a A A

Anomaly Detection Research Based On Multiple Sampling And Dimension Entropy

Posted on:2020-02-08Degree:MasterType:Thesis
Country:ChinaCandidate:B LuoFull Text:PDF
GTID:2428330575994251Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Outliers are data instances that are different,inconsistent and deviate sufficiently from most others in the data set.The main task of anomaly detection is to find the instances of outliers in data sets.Anomaly detection has very important research significance as it can be used to find out some abnormal phenomenons and behaviors.At present,anomaly detection technology has produced many representative applications such as credit card fraud detection,medical diagnosis,environmental monitoring,and gene sequence research,in order to provided key and operational information in various fields of daily life and social production.At present,many anomaly detection algorithms have been proposed in academia,which can be divided into five categories: algorithms based on statistical model,algorithms based on distance,algorithms based on density,algorithms based on subspace and algorithms based on ensemble learning.We summarized the five anomaly detection algorithms mentioned above to analysis their respective advantages and disadvantages.Some frequently-used algorithms,and two main evaluation methods are introduced in the paper.Then we proposed two new algorithms on anomaly detection.1)With the increase of capacity and dimension in data centralization in recent years,higher requirements are put forward for the speed,accuracy and stability of anomaly detection algorithms.The traditional k-nearest neighbor search methods is difficult to meet the requirements.The methods based on one-time sampling is not stable enough.In view of these problems,we proposes a new anomaly algorithm of nearest neighbor based on multiple sampling which is abbreviated as MS-1NN(Nearest Neighbor Based on Multiple Sampling).The algorithm is compared with LOF,SOD and other algorithms in the experiments.The results shows that the algorithm MS-1NN can achieve good detection results in most datasets with using default parameters and a very small training model.And the algorithm can achieve relatively stable results in a relatively short time.2)Anomaly detection,as an important basic research task in the field of data mining,has been concerned by both industry and academia.Among many anomaly detection methods,iForest(isolation Forest)has low time complexity and good detection effect.It has better adaptability in the face of high-capacity and high-dimensional data.However,iForest is not suitable for the special high-dimensional data,is not stable enough,and is not so robust to thenoise features.In view of these problems,this paper proposes an improved anomaly detection method E-iForest(entropy-isolation forest)based on dimension entropy.By introducing the dimension entropy as the basis for selecting the isolation attribute and the isolation point during the training process,the method uses three isolation strategies and adjust the path length calculation.The experiments show that the E-iForest has better detection effect,has better speed in high-capacity datasets,is more stable than iForest and is more robust to the noise features.
Keywords/Search Tags:anomaly detection, multiple sampling, dimensional entropy, nearest neighbor, isolation forest
PDF Full Text Request
Related items