Font Size: a A A

Representing context-dependent categorical and mixed-value data systems for fault and anomaly detection: A highly scalable variable length Markov approach

Posted on:2010-10-24Degree:Ph.DType:Dissertation
University:Stevens Institute of TechnologyCandidate:Brice, PierreFull Text:PDF
GTID:1448390002986100Subject:Engineering
Abstract/Summary:
Complex systems---systems made up of the interconnection of a large number of autonomous subsystems---often accept as input and generate as output a mixture of context-sensitive categorical and numerical data. Examples of such systems are software-intensive embedded interconnected systems such as the ones encountered in telecommunication networks, enterprise operations information systems. Manual techniques used for modeling such systems for fault isolation and anomaly detection are generally very labor intensive and time consuming. As such systems grow in scope, these manual techniques are approaching the limits of their usefulness. Data mining techniques are automatic techniques that are often used to extract knowledge from data in business applications and databases. However, little efforts have been spent to use such techniques to extract models for systems such as the ones mentioned above for anomaly detection and fault isolation. In the cases where such work has been done, the data domain has been restricted to numerical information. As such systems with an abundance of categorical information in their ever expanding logs and transaction traces increase, the need to apply such automatic techniques to model them for prediction and fault isolation becomes more compelling.;This research investigates the use of variable-length Markov methods to model data for prediction and anomaly detection in large multi-variable context-sensitive categorical and mixed-data systems. It applies statistical learning techniques previously restricted to industrial process control methods dealing exclusively with numerical data to model context-sensitive categorical and mixed data systems. It extends data mining and clustering techniques hitherto geared toward databases data to model systems where most of the functionalities are implemented in software with their event logs and call traces full of categorical information for fault isolation and anomaly detection. Data mining can therefore be applied to software debugging and other anomaly detection problems. Furthermore the research seeks to develop automatic methods that scale over very large set of variables which enhance their viability for practical applications.
Keywords/Search Tags:Systems, Anomaly detection, Data, Categorical, Fault, Large, Techniques
Related items