Representing context-dependent categorical and mixed-value data systems for fault and anomaly detection: A highly scalable variable length Markov approach

Posted on:2010-10-24

Degree:Ph.D

Type:Dissertation

University:Stevens Institute of Technology

Candidate:Brice, Pierre

Full Text:PDF

GTID:1448390002986100

Subject:Engineering

Abstract/Summary:

Complex systems---systems made up of the interconnection of a large number of autonomous subsystems---often accept as input and generate as output a mixture of context-sensitive categorical and numerical data. Examples of such systems are software-intensive embedded interconnected systems such as the ones encountered in telecommunication networks, enterprise operations information systems. Manual techniques used for modeling such systems for fault isolation and anomaly detection are generally very labor intensive and time consuming. As such systems grow in scope, these manual techniques are approaching the limits of their usefulness. Data mining techniques are automatic techniques that are often used to extract knowledge from data in business applications and databases. However, little efforts have been spent to use such techniques to extract models for systems such as the ones mentioned above for anomaly detection and fault isolation. In the cases where such work has been done, the data domain has been restricted to numerical information. As such systems with an abundance of categorical information in their ever expanding logs and transaction traces increase, the need to apply such automatic techniques to model them for prediction and fault isolation becomes more compelling.;This research investigates the use of variable-length Markov methods to model data for prediction and anomaly detection in large multi-variable context-sensitive categorical and mixed-data systems. It applies statistical learning techniques previously restricted to industrial process control methods dealing exclusively with numerical data to model context-sensitive categorical and mixed data systems. It extends data mining and clustering techniques hitherto geared toward databases data to model systems where most of the functionalities are implemented in software with their event logs and call traces full of categorical information for fault isolation and anomaly detection. Data mining can therefore be applied to software debugging and other anomaly detection problems. Furthermore the research seeks to develop automatic methods that scale over very large set of variables which enhance their viability for practical applications.

Keywords/Search Tags:

Systems, Anomaly detection, Data, Categorical, Fault, Large, Techniques

Related items

1	Fault detection and diagnosis for large-scale systems
2	Research And Implementation Of An Anomaly Detection Platform For Large-scale Software Systems Based On Large Collections Of Log Messages
3	Robust Fault Detection For Large Scale Systems Under Network Environment
4	Research On Data Anomaly Detection Method Based On Heterogeneous System
5	Artificial intelligence techniques applied to fault detection systems
6	Research On The Large Scale Clustering And Its Applications On Anomaly Detection
7	Large-scale Network Anomaly Detection Based On Data Mining
8	Studies On Clustering Algorithms For Categorical Data
9	Research On Key Techniques Of Anomaly Detection For Big Data Platform Based On Dynamical Rule Base
10	Techniques for Fault Detection and Visualization of Telemetry Dependence Relationships for Root Cause Fault Analysis in Complex Systems