Font Size: a A A

Mining Data Stream With Skewed Distribution Based On Ensemble Method

Posted on:2011-06-19Degree:MasterType:Thesis
Country:ChinaCandidate:Y WangFull Text:PDF
GTID:2178360305474535Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
In many real-life applications, the category distribution is imbalanced, such as credit fraud detection, network intrusion detection, and so on. In these applications, the probability that we observe positive instances, which we are more concerned with, is much less than the probability that we could observe negative ones. For example, in network intrusion detection application, the size of normal network communications are much more than the size of intrusion ones. The state-of-the-art machine learning algorithms have better performance on the negative examples, but have bad performance on the positive examples. Some effective methods have been suggested by the research community to deal with imbalanced data.At same time, the data stream mining has been one of the most studied data mining tasks. The centre of the research community have focus on balanced data stream mining, without enough attention being paid to the study of mining skewed data streams.In this paper, i proposed the ensemble algorithms, which using majority votes to classify the skewed data streams. Two methods have been studied in this paper to deal with the imbalanced data streams. The cluster based sample algorithm is proposed to make the imbalanced data be category balanced ones. For data stream application, i use static classifier ensemble (SCE) and dynamic classifier ensemble (DCE) to deal with data streams.In this paper, there are three main contributions: (1) The cluster based sample algorithm is used to make the imbalanced data sets be balanced ones; (2) In order to deal with the data streams, i propose the SCE algorithm to learn model from data streams; (3) further, based on the SCE algorithm, the DCE algorithm is proposed to improve the performance of data stream mining.I make experiments on both synthetic and real data sets which simulating skewed data streams, and the experiment results show that the proposed algorithms have effective classification performance on imbalanced data stream mining.
Keywords/Search Tags:Skewed Data Streams, Cluster-Sampling, Static Classifier Ensemble, Dynamic Classifier Ensemble
PDF Full Text Request
Related items