Font Size: a A A

Research Of Anomaly Detection Method Based On Hash Mapping And Isolation Principle

Posted on:2021-02-08Degree:MasterType:Thesis
Country:ChinaCandidate:Z L LiFull Text:PDF
GTID:2428330614458544Subject:Control engineering
Abstract/Summary:PDF Full Text Request
The development of Internet technology prompts people to continuously increase the requirements for data in the field of machine learning and data mining.At the same time,the research on the detection of anomalies in data is more in-depth.At present,according to different models,there are usually the following types of abnormal point detection methods:statistical-based abnormal point detection,distance-based abnormal point detection,density-based abnormal point detection,subspace-based abnormal point detection,and integrated learning-based abnormal point.Combining the advantages and disadvantages of the above several anomaly detection methods,this thesis proposes two anomaly detection methods based on isolation thought.Aiming at the problem of low detection accuracy of large data sets with high dimension,massive and low correlation between attributes in isolated Forest algorithm,this thesis proposes an anomaly detection method based on Exact Euclidean Locality-sensitive hashing algorithm and isolation principle.Firstly,the method uses the Exact Euclidean Locality-sensitive hashing algorithm to operate the random hash functions on the original data set,so as to achieve the purpose of mapping dimensionality reduction.Secondly,this method uses the distance relationship between the data to calculate the primary and secondary hash functions.The purpose of hash bucket calculation is to make the data points in the original data space appear in the same bucket after being divided into buckets.Through the above steps,the corresponding dimension reduction sub-data set after bucket splitting is obtained.Then,use the isolated Forest algorithm to perform anomaly detection on the dimensionality reduction sub-data set.Finally,this thesis proposes an average optimization strategy for selecting the optimal segmentation attribute and segmentation value when constructing an isolated tree in isolated Forest algorithm.The experimental results show that compared with the isolated tree generated by the randomness of the isolated Forest algorithm,the proposed method only needs a few isolated trees to form the forest,which can effectively improve the global anomaly detection accuracy of the isolated Forest algorithm in the high-dimensional massive low correlation data set.In order to improve the local anomaly detection accuracy of isolated forest in the face of high-dimensional massive data sets,a new anomaly detection method based onKernel Locality-sensitive hashing algorithm and isolation principle is proposed in this thesis.Firstly,this method uses the Gaussian Kernel Locality-sensitive hashing algorithm to construct kernel hash function instances to kernel the data set,map the data from the original data space to the high-dimensional feature space,and transform the local anomaly problem into the global anomaly problem.Then,the isolated Forest algorithm based on mean optimization is used to detect the anomalies of the kernelized data.The experimental results show that the proposed method not only keeps the ability of detecting global anomalies in isolated forest,but also improves the accuracy of detecting local anomalies.
Keywords/Search Tags:data mining, anomaly detection, isolated Forest, hash algorithm, kernel function
PDF Full Text Request
Related items