Font Size: a A A

Anomaly detection in heterogeneous datasets

Posted on:2008-10-10Degree:Ph.DType:Dissertation
University:Rutgers The State University of New Jersey - NewarkCandidate:Janeja, VandanaFull Text:PDF
GTID:1448390005978650Subject:Computer Science
Abstract/Summary:
Anomaly detection is a data mining technique, which deals with discovering non-trivial and intriguing knowledge, in the form of unusual patterns, objects and relationships. Such a discovery works on the principle of identifying anomalies with respect to the similarly behaving partitions in the data. In this dissertation, we argue that we need to account for heterogeneity within the data in creating these partitions to accurately identify anomalies. In particular, we propose novel approaches to identify anomalous (a) windows, (b) individual objects and (c) relationships among such objects in spatio-temporal and traditional data. We have demonstrated, that accounting of heterogeneity leads to significant improvement in identifying anomalies, by comparing the results of our proposed approaches with those of representative anomaly detection techniques.;First, we have proposed a random walk-based, Free-Form Spatial Scan Statistic approach, called (FS3), to identify natural, free-form anomalous windows. A window is the contiguous part of a region comprising of the objects in it. An anomalous window is unusual compared to the rest of the windows in the region in terms of a specific event of interest. FS3 eliminates the limitations of existing approaches by identifying windows that are not restricted to a predefined (e.g. circular, rectangular) shape. Our results have indicated that FS3 identifies more refined anomalous windows in terms of better likelihood ratio of it being an anomaly than those identified by earlier spatial scan statistic approaches.;While FS3 effectively identifies anomalous windows in terms of an event of interest throughout a region, it may not identify individual spatial objects that are outliers with respect to a group of similarly behaving objects forming a neighborhood. In this dissertation, we have proposed a spatial outlier detection approach to detect individual anomalous objects by taking into account both auto-correlation and spatial heterogeneity, whereas existing approaches consider only auto-correlation. Another distinguishing feature of our approach is that, it is order invariant in neighborhood creation. Additionally, we have developed an approach to identify spatio-temporally coalesced outliers comprising of outliers separated by a small time lag, to capture an anomaly traversing through the region. Experimental results in sensor datasets have demonstrated a significant improvement in neighborhood formation and outlier detection, as compared to existing approaches.;Finally, we have have studied the problem of Collusion Set Detection (CSD) to discover anomalous relationships among objects. In isolation, individual objects may appear benign, but when considered in relation to each other, they can exhibit interesting behaviors. The goal of CSD is to identify a combination of objects that together satisfy a notion of interesting collusive behavior. In this dissertation, we have focused on the specific sub problem of discovering collusive behavior among outlier objects (CSD'). It considers spatial, temporal and semantic proximities among objects. We have devised a novel collusion distance metric to identify distance based outliers and the causal attributes for the outlying behavior. We have incorporated an ontology-based definition from multiple heterogeneous sources for discovering the semantic relationships between the causal attributes. We have demonstrated that CSD' improves both precision and recall, as compared to the traditional approach of Euclidean distance-based outlier detection. It also has reduced significantly the cost of identifying the collusion sets since we do not use the full attribute set but only the causal attributes to identify the collusions between the outlier objects.
Keywords/Search Tags:Detection, Objects, Data, Anomaly, Identify, Causal attributes, Outlier, FS3
Related items