Anomaly Detection Research Based On Multiple Sampling And Dimension Entropy

Posted on:2020-02-08

Degree:Master

Type:Thesis

Country:China

Candidate:B Luo

Full Text:PDF

GTID:2428330575994251

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Outliers are data instances that are different,inconsistent and deviate sufficiently from most others in the data set.The main task of anomaly detection is to find the instances of outliers in data sets.Anomaly detection has very important research significance as it can be used to find out some abnormal phenomenons and behaviors.At present,anomaly detection technology has produced many representative applications such as credit card fraud detection,medical diagnosis,environmental monitoring,and gene sequence research,in order to provided key and operational information in various fields of daily life and social production.At present,many anomaly detection algorithms have been proposed in academia,which can be divided into five categories: algorithms based on statistical model,algorithms based on distance,algorithms based on density,algorithms based on subspace and algorithms based on ensemble learning.We summarized the five anomaly detection algorithms mentioned above to analysis their respective advantages and disadvantages.Some frequently-used algorithms,and two main evaluation methods are introduced in the paper.Then we proposed two new algorithms on anomaly detection.1)With the increase of capacity and dimension in data centralization in recent years,higher requirements are put forward for the speed,accuracy and stability of anomaly detection algorithms.The traditional k-nearest neighbor search methods is difficult to meet the requirements.The methods based on one-time sampling is not stable enough.In view of these problems,we proposes a new anomaly algorithm of nearest neighbor based on multiple sampling which is abbreviated as MS-1NN(Nearest Neighbor Based on Multiple Sampling).The algorithm is compared with LOF,SOD and other algorithms in the experiments.The results shows that the algorithm MS-1NN can achieve good detection results in most datasets with using default parameters and a very small training model.And the algorithm can achieve relatively stable results in a relatively short time.2)Anomaly detection,as an important basic research task in the field of data mining,has been concerned by both industry and academia.Among many anomaly detection methods,iForest(isolation Forest)has low time complexity and good detection effect.It has better adaptability in the face of high-capacity and high-dimensional data.However,iForest is not suitable for the special high-dimensional data,is not stable enough,and is not so robust to thenoise features.In view of these problems,this paper proposes an improved anomaly detection method E-iForest(entropy-isolation forest)based on dimension entropy.By introducing the dimension entropy as the basis for selecting the isolation attribute and the isolation point during the training process,the method uses three isolation strategies and adjust the path length calculation.The experiments show that the E-iForest has better detection effect,has better speed in high-capacity datasets,is more stable than iForest and is more robust to the noise features.

Keywords/Search Tags:

anomaly detection, multiple sampling, dimensional entropy, nearest neighbor, isolation forest

PDF Full Text Request

Related items

1	Increasing the precision of forest area estimates through improved sampling for nearest neighbor satellite image classification
2	Anomaly Detection Of Malicious Android Applications Based On K-Nearest Neighbor
3	Study Of Outlier Detecting Algorithm Based On Natural Nearest Neighbor And Weighted Attribute Entropy
4	Research Of Anomaly Detection Method For High Dimensional Data
5	Research On Online Anomaly Detection Method Of Network Data Stream Based On Isolation Forest
6	Study On Collective Anomaly Detection And Optimization Based On Statistical Distance
7	The Index For The Nearest Neighbor Queries In High Dimensional Space
8	Research On Parallelization Of Isolation Forest Algorithm Based On Spark
9	Multiple Hash Tables Indexing And Optimization For Approximate Nearest Neighbor Search
10	Research On Anomaly Detection Based On Ensemble Learning Algorithms