Font Size: a A A

Classification Algorithm For Data Streams With Concept Drift And Its Applications

Posted on:2014-02-20Degree:MasterType:Thesis
Country:ChinaCandidate:N LiFull Text:PDF
GTID:2268330401474771Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The classification of concept drifting data streams has been a hot research topic in recent years, and its main application fields include credit card fraud detection, network intrusion detection etc. Different from the traditional classification model, the algorithms for mining data streams have to make fast response and adapt to the concept drift at the premise of light demands on memory resources.This paper discusses the characteristics and problems of the existing classification model for data streams. After proposing an improved version for KNNModel algorithm named IKNNModel, we extend its core idea to the classification of concept drifting data steams. Major works include:1. An Improved KNNModel algorithm named IKNNModel. When using KNNMdel algorithm, it is difficult to construct the model cluster for each class if the training instances from different classes are overlapped with one another on the original space. By projecting different training instances onto their own optimal subspace and constructing the corresponding class cluster and pure clusters for each class as the basic of classification, compared with the traditional KNNModel algorithm, IKNNModel algorithm improves the efficiency of classification.2. An ensemble classification algorithm for data streams (ECA). Once the concepts of the data streams drift, the existing algorithms have to rebuild the whole model to adapt to the current concepts. ECA algorithm applies the core idea of IKNNModel algorithm to the specific data streams environment. By constructing the central point and the corresponding subspace for every class on each block, if the concepts of a few classes on the data streams drift, it only needs to rebuild the corresponding part of the current classification model, which improves the speed of processing instances.3. A classification model for data streams based on mixture model (KnnM-IB). Most existing classification algorithms for data streams hold an impractial assumption that the the true labels of the testing instances can be accessed right after they are classified and utilize them to detect concept drift and adjust current model. With the help of the semi-supervised learning and variable window size, KnnM-IB algorithm is able to detect the concept drift on the data streams and update the model effectively with limited amount of labeled instances.Both the experiments on synthetic data set and real data set show the efficiency and the effectiveness of the proposed algorithms.
Keywords/Search Tags:data mining, data streams, classification, subspace, concept drift, semi-supervised learning
PDF Full Text Request
Related items