An effective evolving data stream classification

Posted on:2013-07-27

Degree:Ph.D

Type:Dissertation

University:The University of Texas at Dallas

Candidate:Al-khateeb, Tahseen M

Full Text:PDF

GTID:1458390008469731

Subject:Computer Science

Abstract/Summary:

PDF Full Text Request

Overwhelming volumes of continuously generated data (Data streams) are very common now-a-days in various domains such as e-commerce, education, health, and social networks. Examples of data streams are network traffic, social media blogs, and credit card transactions. Data stream classification can be defined as a technique to predict future instances class labels utilizing classification models trained on the past labeled data. Data stream classification has been widely studied recently because of its ever-growing demand. The infinite length of data streams, combined with its dynamic nature create considerable challenges in mining research. Because of this dynamic nature, the classification model needs to be refined continuously with new incoming training data. One of the dynamic characteristics of data streams is known as concept-drift, which occurs as a result of a change in the underlying concept of the data, making previously trained models outdated. Another major challenge in data stream classification is concept-evolution; that refers to the emergence of a new or novel class. A special and more common case of concept-evolution in data streams is a recurring class. It occurs when a class reappears after long disappearance from the stream. Existing classification techniques that address the concept-evolution wrongly detect the recurring classes as novel classes which deteriorates classifier performance and wastes Expert Systems With Applications and human resources. This dissertation addresses all the afore-mentioned challenges in a number of ways. We address infinite length and concept-drift by proposing bounding component ensemble classification models that exploit class-based ensemble instead of traditional chunk-based ensemble. We tackle the recurring class and concept-evolution issues by proposing novel and recurring class detection with augmenting additional ensemble and extrapolating class-based ensemble. We show analytically and confirm empirically on both synthetic and several benchmark data streams that the proposed techniques efficiently and significantly reduce classification errors compared to state-of-the-art techniques.

Keywords/Search Tags:

Data stream, Classification

PDF Full Text Request

Related items

1	Research On Data Stream Classification Algorithm With Limited Amount Of Labeled Data
2	Research On The Classification Methods For Dynamic Data Stream
3	Data Stream Classification Algorithm Based On The Ep
4	Research On Stream Data Classification Algorithm Mining Based On Spark
5	Research On The Emerging Patterns-based Integrative Weighted Classification Algorithm For Stream Data
6	The Research On Massive And Dynamic Data Stream Classification Method
7	Research On Classification Algorithms Of Data Stream
8	The Research On Classification Algorithms Over Data Stream
9	Classification Of Dynamic Data Stream With Noise
10	Research On Stream Data Classification Algorithm Based On STORM