Font Size: a A A

Data stream classification techniques for multiple novel classes and dynamic feature spaces

Posted on:2011-02-12Degree:Ph.DType:Dissertation
University:The University of Texas at DallasCandidate:Chen, QIngFull Text:PDF
GTID:1448390002966944Subject:Engineering
Abstract/Summary:
Multi-step methodologies and techniques, and multi-scan algorithms, suitable for knowledge discovery and data mining, cannot be readily applied to data streams. Data stream classification is more challenging because of many practical aspects associated with efficient processing and temporal behavior of the stream. Two such well studied aspects are infinite length and concept drift. Since a data stream may be considered a continuous process, which is theoretically infinite in length, it is impractical to store and use all the historical data for training. Data streams also frequently experience concept-drift as a result of changes in the underlying concepts. However, two other important characteristics of data streams, namely, concept evolution and feature evolution are rarely addressed in the literature. Concept evolution occurs in the stream when novel classes arrive, and feature evolution occurs when new features emerge in the stream. This dissertation addresses concept evolution and feature evolution in addition to the existing challenges of infinite length and concept drift. Although there are a few data stream classification techniques that address concept evolution, none of them considers feature evolution. In this dissertation, the concept evolution and feature evolution phenomenon are studied, and the insights are used to construct superior novel class detection techniques. First, the dynamic nature of the feature space is considered, and an effective solution is provided for classification and novel class detection when the feature space is dynamic. Second, an adaptive threshold is proposed for outlier detection, which is a vital part of novel class detection. Third, a probabilistic approach is proposed for novel class detection using discrete Gini Coefficient, and its effectiveness is proved both theoretically and empirically. Finally, the issue of simultaneous multiple novel class occurrence is addressed, and an elegant solution is provided to detect more than one novel classes at the same time. Comparison with the state-of-the-art data stream classification techniques on several real and synthetic data streams establishes the effectiveness of the proposed approach.
Keywords/Search Tags:Data, Novel class, Feature, Concept evolution, Dynamic
Related items