Font Size: a A A

Towards self-adaptive anomaly detection sensors

Posted on:2011-10-08Degree:Ph.DType:Thesis
University:Columbia UniversityCandidate:Ciocarlie, Gabriela FFull Text:PDF
GTID:2448390002464150Subject:Computer Science
Abstract/Summary:
Spurred by the ever growing availability of online services and resources, threat models are constantly evolving. As a result, the same security techniques that were sufficient a decade, or even a few years ago, can prove inadequate today. In particular, recent advances in polymorphic attacks and the increasing volume of zero-day attacks threaten to overwhelm signature based defense mechanisms. As attackers are finding new ways to gain access to network and systems, so defense mechanisms must find new ways to protect them.;Anomaly Detection (AD) sensors provided a breakthrough in the defense against polymorphic and zero-day attacks by relying on models of normal behavior, rather than signatures of malicious input. However, as AD-based approaches are increasingly introduced as first-class defensive techniques, a number of open problems regarding their deployment and maintenance remain. The efficacy of AD sensors depends heavily on the quality of the data used to train them, and artificial or contrived training data may not provide a realistic view of the deployment environment. Most realistic data sets contain a number of attaches or anomalous events, their size making manual removal of attack data infeasible. As a result, sensors trained on this data can miss attacks and their variations. Another roadblock on the way to widespread adoption of AD sensors is that their deployment and maintenance often require significant intervention by a human expert, to manually optimize their performance and keep them up-to-date with changes in the system.;In this thesis we attempt to address these challenges by introducing a set of methods for self-sanitizing, self-calibrating and self-updating AD sensors. Our overall goal is to introduce a general framework for a class of AD sensors that can automatically adapt to the system under protection, combining detection performance with ease of deployment and operation.;We begin by extending the training phase for a class of content-based AD sensors to include a novel sanitization phase that significantly improves the detection performance of these sensors, in a manner agnostic to the underlying AD algorithm. This phase generates multiple models conditioned on small slices of the training data. We use these 'micro-models' to produce provisional labels for each training input, and we combine the micro-models in a voting scheme to determine which parts of the training data may represent attacks. Our results suggest that this phase automatically and significantly improves the quality of unlabeled training data by making it as "attack-free" and "regular" as possible in the absence of absolute ground truth.;We then study the performance issues that stem from fully automating the AD sensors' calibration and long-term maintenance. Our goal is to reprove the dependence on human operators using an unlabeled, and thus potentially dirty, sample of incoming traffic. To that end, we propose to enhance the training phase of AD sensors with a self-calibration phase that can be employed in conjunction with the sanitization technique resulting in a fully automated AD maintenance cycle. These techniques can be applied in an online fashion to ensure that the resulting AD models reflect changes in the system's behavior which would otherwise render the sensor's internal state inconsistent. We verify the validity of our approach through a series of experiments where we compare the manually obtained optimal parameters with the ones computed from the self-calibration phase. Modeling traffic from two different sources, the fully automated calibration shows performance comparable to that obtained using optimal parameters. Finally, our adaptive models outperform the statically generated ones retaining the gains in performance from the sanitization process over time.;The race between the attacker and defender for gaining access to the protected system is one of skill, information and resources. The methods that we have discussed so far improve the performance of local AD sensors through better data analysis and training methods. However, a well-equipped attacker, armed with intimate information on the protected system as well as extensive resources, can still attempt training attacks that change the normal input patterns. To cope with this possibility, we extend our methodology to support sharing models of abnormal traffic among multiple collaborating sites. We show that if one of the sites is able to capture an attack in its abnormal model, all of the collaborators can benefit via model exchange. Our framework makes this possible by defining a number of model operations which can be implemented for a wide range of AD sensors.
Keywords/Search Tags:AD sensors, Model, Detection, Training data
Related items