Font Size: a A A

Knowledge discovery in databases for intrusion detection, disease classification and beyond

Posted on:2002-07-25Degree:Ph.DType:Thesis
University:New York UniversityCandidate:Berger, Gideon LeeFull Text:PDF
GTID:2468390011996682Subject:Computer Science
Abstract/Summary:PDF Full Text Request
As the number of networked computers and the amount of sensitive information available on them grows there is an increasing need to ensure the security of these systems. Passwords and encryption have, for some time, provided an important initial defense. Given a clever and malicious individual these defenses can, however, often be circumvented. Intrusion detection is therefore needed as another way to protect computer systems.; This thesis describes a novel three stage algorithm for building classification models in the presence of nonstationary, temporal, high dimensional data, in general, and for detecting network intrusion detections, in particular. Given a set of training records the algorithm begins by identifying “interesting” temporal patterns in this data using a modal logic. This approach is distinguished from other work in this area where frequent patterns are identified. We show that when frequency is replaced by our measure of “interestingness” the problem of finding temporal patterns is NP-complete. We then offer an efficient heuristic approach that has proven experimentally effective.; Having identified interesting patterns, these patterns then become the predictor variables in the construction of a Multivariate Adaptive Regression Splines (MARS) model. This approach will be justified by its ability to capture complex nonlinear relationships between the predictor and response variables which is comparable to other heuristic approaches such as neural networks and classification trees, while offering improved computational properties such as rapid convergence and interpretability.; After considering several approaches to the problems of overfitting which is inherent when modeling high dimensional data and nonstationarity, we describe our approach to addressing these issues through the use of truncated Stein shrinkage. This approach is motivated by showing the inadmissability of the maximum likelihood estimator (MLE) in the high dimensional (dimension ≥ 3) data.; We then discuss the application of our approach as participants in the 1999 DARPA Intrusion Detection Evaluation where we exhibited the benefits of our approach.; Finally, we suggest another area of research where we believe that our work would meet with similar success, namely, the area of disease classification.
Keywords/Search Tags:Classification, Intrusion detection, Data
PDF Full Text Request
Related items