Knowledge discovery in databases for intrusion detection, disease classification and beyond

Posted on:2002-07-25

Degree:Ph.D

Type:Thesis

University:New York University

Candidate:Berger, Gideon Lee

Full Text:PDF

GTID:2468390011996682

Subject:Computer Science

Abstract/Summary:

PDF Full Text Request

As the number of networked computers and the amount of sensitive information available on them grows there is an increasing need to ensure the security of these systems. Passwords and encryption have, for some time, provided an important initial defense. Given a clever and malicious individual these defenses can, however, often be circumvented. Intrusion detection is therefore needed as another way to protect computer systems.; This thesis describes a novel three stage algorithm for building classification models in the presence of nonstationary, temporal, high dimensional data, in general, and for detecting network intrusion detections, in particular. Given a set of training records the algorithm begins by identifying “interesting” temporal patterns in this data using a modal logic. This approach is distinguished from other work in this area where frequent patterns are identified. We show that when frequency is replaced by our measure of “interestingness” the problem of finding temporal patterns is NP-complete. We then offer an efficient heuristic approach that has proven experimentally effective.; Having identified interesting patterns, these patterns then become the predictor variables in the construction of a Multivariate Adaptive Regression Splines (MARS) model. This approach will be justified by its ability to capture complex nonlinear relationships between the predictor and response variables which is comparable to other heuristic approaches such as neural networks and classification trees, while offering improved computational properties such as rapid convergence and interpretability.; After considering several approaches to the problems of overfitting which is inherent when modeling high dimensional data and nonstationarity, we describe our approach to addressing these issues through the use of truncated Stein shrinkage. This approach is motivated by showing the inadmissability of the maximum likelihood estimator (MLE) in the high dimensional (dimension ≥ 3) data.; We then discuss the application of our approach as participants in the 1999 DARPA Intrusion Detection Evaluation where we exhibited the benefits of our approach.; Finally, we suggest another area of research where we believe that our work would meet with similar success, namely, the area of disease classification.

Keywords/Search Tags:

Classification, Intrusion detection, Data

PDF Full Text Request

Related items

1	Knowledge discovery in databases for intrusion detection, disease classification and beyond
2	Web Intrusion Detection Based On Imbalanced Data Classification Method
3	Network Intrusion Detection System Based On Data Mining
4	Classification And Visualization Of Abnormal Data In Intrusion Detection
5	Network Intrusion Detection Technology Research And Application
6	Research Of Intrusion Detection Model Based On Data Stream Feature Selection And Classification Algorithm
7	Research On Unbalanced Intrusion Data Detection Based On Oversampling
8	Research On Mixed Network Intrusion Detection Based On Data Mining
9	Research On Intrusion Detection Method With Improved Random Decision Tree
10	Research For Intrusion Detection Based On Data Mining Technology