Font Size: a A A

Study On Data Streams Classification Algorithms Based On Ensemble Classifier

Posted on:2013-08-27Degree:MasterType:Thesis
Country:ChinaCandidate:X Z MaFull Text:PDF
GTID:2298330467974661Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Data streams, as a new form of data, get involved in aspects of life applications. For instance, Internet affair log, telephone call log, credit card transaction and sensor and monitor system etc. Compared to traditional data, data streams have the features of potentially infinity, continuity, fast speed and concept drift. These features make it impossible to apply traditional classification technique to data streams. Therefore, the design of one-passing, dynamically updated algorithm working in the real-time manner is necessary.Now, ensemble classifiers have the advantage of easily updating the model, high accuracy, and rapidly adjusted to concept drift. So they become the main orientation in the data streams classification research. But, meanwhile ensemble classifiers have some flaws, for example, most of them don’t take the diversity and independence between base classifiers into account.This paper studies on the ensemble method of data streams classification. First, it introduces fundamental knowledge about data streams classification, related work and research background, and analyze the common methods of modeling and advantages, and specifically describes the methods dealing concept drift. Second, it specifies the classic ensemble classifier on data streams Accuracy Weighted Ensemble (AWE). Based on this classic algorithm, a data streams classification algorithm CUE is proposed, which can handle the concept drift. CUE does some improvements over AWE, such as weight assignment of base classifiers, how to use training data and updating of the ensemble model. Evaluation conducted on databases shows that compared with AWE, CUE presents better predictive accuracy and more effective. Third, it introduces classifier dynamic selection algorithm. According to the idea of the algorithm, a new algorithm named Dynamic Classifier Selection with Clustering (DCSC) is proposed. The main idea is that for a new incoming instance ready to be classified, the closest classifier is chosen to classify it. Without complicated weight assignment process, DCSC is fast to classify instances. Evaluation conducted on databases shows that DCSC has effectiveness and time efficiency, and better performance for dealing concept drift.
Keywords/Search Tags:data streams, ensemble classifier, concept drift, clustering
PDF Full Text Request
Related items