Font Size: a A A

Classifier Ensemble For Data Stream Classification

Posted on:2012-12-04Degree:MasterType:Thesis
Country:ChinaCandidate:S R PanFull Text:PDF
GTID:2218330344451315Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the development of communication and computer information technology, various application domains, such as financial markets, network monitoring, and sensor network, have generated large volumes of continuously data stream. Mining based on data stream draws a lot of attentions in the research community. However, currently most available algorithms for data stream classification assume that the streaming data is precise enough, while data with uncertainty or imperfection is quite natural and widely seen in real-life applications due to the measurement error, transmitting delay, and missing values, etc. Meanwhile, most streaming algorithms assume that data stream can be fully labeled. However, it is impractical and time-consuming to label the entire data stream for limited resource of human power in real-life applications. Thus, it is of great significance to study the data stream classification with uncertainty, and classification of data stream with only a small part of labeled data.This paper focuses on data stream classification with uncertainty arisen in class values, and the classification of data stream with only positive and unlabeled data (PU Stream). The major contributions of this paper are:(1) As the uncertain decision tree algorithm NS-PDT can only handle concrete attribute, this paper proposes to traverse all the possible splitting point to get the maximum non-specificity gain, so as to extend NS-PDT to handle continuous attribute. (2)This paper proposes to use a Static Classifier Ensemble (SCE) algorithm foruncertain data stream classification. SCE algorithm uses the extended algorithm NS-PDT as a base classifier, and utilizes a weighted vote scheme to predict the unlabeled data, thus it achieves good classification performance.(3) Based on SCE algorithm, this paper proposes a novel Dynamic Classifier Ensemble (DCE) algorithm, which decides the weight of each base classifier according to the test example, so as to improve the classification performance of SCE.(4) This paper proposes a novel algorithm, DCEPU, for PU Stream classification. DCEPU constructs an appropriate validation set, and employs a novel weight scheme. Thus it can cope with the concept drift under PU Stream scenario. Both SCE and DCE are evaluated and compared on the synthetic datasets SEA and Hyperplane, real-life dataset RCV1-v2. The experimental results demonstrate that both SCE and DCE can handle concept drift, and DCE achieves about 2% improvement in PCC_dist accuracy over SCE algorithm.Various scenarios of concept drift are simulated on RCV1-v2 to evaluate the effectiveness of DCEPU algorithm, and the experimental results show that DCEPU algorithm achieves a maximal improvement at 3.3% in F1 index over the Stacking algorithm.
Keywords/Search Tags:uncertain data, data stream classification, concept drift, classifier ensemble, PU data stream
PDF Full Text Request
Related items