Font Size: a A A

Research On Anomaly Detection Algorithm Based On Random Forest

Posted on:2020-03-03Degree:MasterType:Thesis
Country:ChinaCandidate:M HuFull Text:PDF
GTID:2428330620456743Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Data mining is the process of discovering hidden knowledge or patterns in data.As an important data mining method,anomaly detection plays an important role in industrial production and is widely used in financial anti-fraud,equipment fault detection,network intrusion,and medical image analysis.This is of great significance to the improvement of industrial production efficiency and people's living standards.Therefore,anomaly detection plays an important role in data mining.At present,outlier detection based on the integrated model is a hot research topic.This paper conducts in-depth research on the random forest model,combines the characteristics of outliers with it,and proposes the outlier detection algorithm based on the random forest model.The specific work includes the following three aspects:1?A random forest algorithm based on fuzzy tree node is proposed for anomaly detection.In the process of constructing the classification tree of the random forests,the fuzzy method is introduced into the nodes of the binary decision tree.The fuzzy regions about the class division are designed in the nodes,and the normal and anomaly membership functions are designed on the fuzzy regions.When a sample passes through the fuzzy region of the decision tree node,if the sample's anomaly membership degree is greater than the normal membership degree,the sample is discriminated as the anomaly class.Otherwise,the sample enters the lower tree node of the decision tree and can be identified as a normal class if there is no lower node.The final class of the sample is determined by the voting steps in the random forest algorithm.Experiments show that the algorithm has a high performance of anomaly detection,and the algorithm is stable.2?A random forest algorithm combining double features and relaxation boundary is proposed for anomaly detection.Firstly,in the process of constructing binary decision tree of random forest with normal class data only,the range of two features were recorded in each node of the binary decision tree,and the double-feature eigenvalue ranges were used as the basis for the abnormal point judgment.Secondly,during the anomaly detection,if a sample did not satisfy the double-feature eigenvalue range in the decision tree node,the sample would be marked as a candidate exception class;otherwise,the sample would enter the lower nodes of decision tree and continue the comparision with the corresponding double-feature eigenvalue range,then the sample would be marked as candidate normal class if there were no lower nodes.Finally,the discriminative mechanism in random forest algorithm was used to distinguish the class of the samples.Experiments show that the algorithm has a high performance of anomaly detection,and the algorithm is stable.3?A random forest algorithm based on a sample backtracking mechanism is proposed for anomaly detection.Firstly,in the process of constructing the decision tree of random forest only with normal class data,the value ranges of two attributes are recorded in each node of the binary decision tree,and the double-attribute ranges are used as the basis for the judgment of anomalies;secondly,each decision tree uses its out-of-bag data to trace from leaf nodes to root nodes,with the purpose of correcting the double-attribute ranges in the tree nodes.In performing an anomaly detection,when a sample does not fall within double-attribute ranges in the decision tree node,then the sample is marked as a candidate anomaly;otherwise,the sample enters the lower tree node of the decision tree to continue to compare the attribute-range,and the sample is marked as a candidate normal class if there is no lower node;finally,the final category of the sample is determined by the discrimination mechanism implemented in the random forest algorithm.Experimental results of anomaly detection on six UCI data sets show that the comprehensive performance of the new method is comparable to or better than that of the comparison method and is stable at a high level.The characteristic of anomalies is that the value of their key features is significantly different from that of normal samples.This is the fundamental reason why anomalies are isolated from normal samples.In this paper,this feature is combined with the hierarchical structure of decision tree to detect anomalies in the process of decision tree classification,and the algorithm is efficient and stable.
Keywords/Search Tags:Fuzzy membership function, Fuzzy tree node, Double-feature filtering, Relaxation boundary, Sample-backtracking
PDF Full Text Request
Related items