Font Size: a A A

An effective evolving data stream classification

Posted on:2013-07-27Degree:Ph.DType:Dissertation
University:The University of Texas at DallasCandidate:Al-khateeb, Tahseen MFull Text:PDF
GTID:1458390008469731Subject:Computer Science
Abstract/Summary:PDF Full Text Request
Overwhelming volumes of continuously generated data (Data streams) are very common now-a-days in various domains such as e-commerce, education, health, and social networks. Examples of data streams are network traffic, social media blogs, and credit card transactions. Data stream classification can be defined as a technique to predict future instances class labels utilizing classification models trained on the past labeled data. Data stream classification has been widely studied recently because of its ever-growing demand. The infinite length of data streams, combined with its dynamic nature create considerable challenges in mining research. Because of this dynamic nature, the classification model needs to be refined continuously with new incoming training data. One of the dynamic characteristics of data streams is known as concept-drift, which occurs as a result of a change in the underlying concept of the data, making previously trained models outdated. Another major challenge in data stream classification is concept-evolution; that refers to the emergence of a new or novel class. A special and more common case of concept-evolution in data streams is a recurring class. It occurs when a class reappears after long disappearance from the stream. Existing classification techniques that address the concept-evolution wrongly detect the recurring classes as novel classes which deteriorates classifier performance and wastes Expert Systems With Applications and human resources. This dissertation addresses all the afore-mentioned challenges in a number of ways. We address infinite length and concept-drift by proposing bounding component ensemble classification models that exploit class-based ensemble instead of traditional chunk-based ensemble. We tackle the recurring class and concept-evolution issues by proposing novel and recurring class detection with augmenting additional ensemble and extrapolating class-based ensemble. We show analytically and confirm empirically on both synthetic and several benchmark data streams that the proposed techniques efficiently and significantly reduce classification errors compared to state-of-the-art techniques.
Keywords/Search Tags:Data stream, Classification
PDF Full Text Request
Related items