Isolated Forest Algorithm Based On Qualitative Data Clustering

Posted on:2022-06-29

Degree:Master

Type:Thesis

Country:China

Candidate:M H Chen

Full Text:PDF

GTID:2518306539981429

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

With the advent of the era of big data,the efficiency of obtaining data is constantly improving.How to identify outliers which are completely different from other samples from massive data has become an important issue to be considered in production activities.Nowadays,many different anomaly detection schemes have been proposed to solve the problem of outlier recognition.However,these methods have different defects,such as the requiring massive size dataset for training or high relying on the parameters selection.Compared with other anomaly detection algorithms,the Isolation Forest has several advantages,such as low time complexity,only small data sets for training,less parameter selection and so on.Meanwhile,the problem is that the test results may be inaccurate because of randomly selecting attributes to divide samples in the training process.In order to solve the above problems,this thesis uses rough set theory and rough set to judge the importance of different attributes.Combined with isolated forest algorithm.An isolated forest algorithm based on qualitative data clustering is proposed.The specific work is as follows:(1)in the Isolation Forest.The process of selecting attributes to divide samples according to their size is a completely random strategy.When this method constructs an Isolation tree,it may ignore the attributes that have a great influence on the results and choose the attributes that have a low influence on the results,thus resulting in inaccurate detection results.In this paper,the theory of using clustering results to calculate the importance of different attributes to information systems in Qualitative Data Clustering is chosen,and the relatively important attributes are screened out for constructing Isolation trees.Experiments show that the effect of this method is improved compared with other methods.(2)In order to prove the effectiveness of this method in practical problems,a real credit card data set is used to detect fraudulent transactions with the proposed method.And in this process,for data sets of different magnitudes,this method has made some improvements in the implementation details of calculating the importance of attributes.The data set is divided into several sub-datasets,and then the importance on the subdatasets is calculated by sampling respectively,and the importance of attributes is calculated by integrating multiple results.Finally,the effectiveness of this method is proved by experiments.

Keywords/Search Tags:

Anomaly Detection, Isolation Forest, Rough Set, Clustering Algorithm

PDF Full Text Request

Related items

1	Research On Intrusion Detection Method Based On Isolation Forest
2	Research On Parallelization Of Isolation Forest Algorithm Based On Spark
3	Research On Online Anomaly Detection Method Of Network Data Stream Based On Isolation Forest
4	Research Of Anomaly Detection Method Based On Hash Mapping And Isolation Principle
5	Research On Anomaly Detection Based On Ensemble Learning Algorithms
6	The Key Technology Of ADS-B Data Organization And Analysis Based On Hadoop
7	Outsourcing Computation Of Privacy Preserving Anomaly Detection Algorithm Based On Secure Multiparty Computation
8	Application Research Of Outlier Anomaly Detection Technology For Time Series Data
9	Research And Development Of Network Traffic Anomaly Detection And Isolation System
10	Research On Anomaly Detection Based On Linux Process Behavior