Font Size: a A A

Local-oriented Data Stream Abnormal Outlier Mining Algorithms And Applications Dynamically

Posted on:2011-12-28Degree:MasterType:Thesis
Country:ChinaCandidate:K GaoFull Text:PDF
GTID:2208360308462924Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Most of the data mining algorithms'main research questions are to find the "large patterns". Outlier detection algorithms are used to find "small patterns" in the data set. The outlier is an observation that deviates so much from other observations as to arouse suspicion that it was generated by a different mechanism. The task of outlier detection could be described as follows:Given a data set which contains N data points, and the expected number of outliers, n, finding the top-n data points which are most inconsistent, anomalous or significantly dissimilar with the rest points. Outlier detection is an important branch of data mining and has been applied to a large number of fields. Especially in the network intrusion detection, the intrusion behaviors are different to the normal behaviors, so we make research of the outlier detection algorithm and apply it to the network intrusion detection.The LOF algorithm could find the outliers based on different density. Assigning to each data record a degree of being outlier, it is more close to the definition of outliers. Recently a new class of data-intensive applications called data streams has become widely recognized in which the relation of data is modeled not as persistent and the data elements arrive continuously and rapidly in large-scale. The records of network connection arriving continuously belong to streaming data. But the time complexity of the original static LOF algorithm is high and it could not adapt to the changes of data streams, the original LOF algorithm is not suitable for real-time data stream mining. We have researched on the way to identify and detect anomalies in the data stream environment accurately and a dynamical local outlier detection algorithm:n-IncLOF is proposed which could adjust the n-threshold adaptively. n-IncLOF is based on the local outlier detection algorithm. Due to the problem that the number of the outliers in data streams is uneven, the adjustment function of n-threshold is proposed. We have also analyzed the situation when the data point is inserted, deleted and modified. The description of the n-IncLOF algorithm is given and the complexity of the algorithm is analyzed too.The anomaly detection system OutlierDIDS has been designed which uses the n-IncLOF algorithm as the detection engine based on both host and network properties. The experiment of outlier detection performed on the KDD CUP99 data stream proves the validity of the n-IncLOF algorithm:it could not only increase the detection rate significantly but also reduced the false alarm rate at the same time compared to the original algorithm. The feasibility of the OutlierDIDS:effectiveness, adaptability and real-time performance is also proved in the experiment.
Keywords/Search Tags:Outlier, n-threshold, Data Streams, Data Mining, IDS
PDF Full Text Request
Related items