Font Size: a A A

Window-based Classification Algorithms For Concept Drifting Data Streams

Posted on:2012-11-12Degree:MasterType:Thesis
Country:ChinaCandidate:Q ZhuFull Text:PDF
GTID:2178330335461580Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of information technology, lots of data are generated in the numerous application fields, such as network security, stock analysis, e-commerce and weather monitoring. In these data, abundant and valuable information is hidden and needed to be mined urgently. Motivated by this, more and more attentions are focused on the learning from data streams.Data streams present new characteristics as being fast, continuous, high-volume, open-ended, and concept drifting. It is hence a challenge for most traditional classification algorithms. However, our work is focused on the classification of concept drifting data stream, and main contributions are as follows:(1) Some problems in data streams are first summarized, and then the related work on the classification of concept drifting data streams are reviewed and analyzed.(2) In this thesis, a fixed-window-based classification algorithm for data streams with concept drifts (named SWCDS) is first proposed to adapt new characteristics of data streams. In SWCDS, random forests of decision trees are selected as the base classifiers. Meanwhile, a sliding window mechanism is utilized to detect concept drifts, and the classifier model is updated dynamically to adapt to concept drifts. Extensive experiments demonstrate that the performance on the robustness to noise and the classification accuracy in SWCDS is improved significantly compared to several state-of-the-art classification algorithms for concept drifting data streams.(3) In terms of the aforementioned work, a new concept drifting data stream classification algorithm based on a double-window mechanism (named DWCDS) is further proposed in this dissertation. DWCDS adopts the same model in SWCDS as the base classifier. Meanwhile, to overcome the weakness of single-window-based mechanism, it introduces a double-window-based mechanism to detect different types of concept drifts. Experimental results show that the proposed double-window-based mechanism could detect various concept drifts from streaming data quickly and efficiently compared to the single-window-based mechanism. (4) Last, the prototype system for mining concept drifting data streams is designed. It includes the algorithms of SWCDS and DWCDS mentioned above, which provides an experimental classification platform.
Keywords/Search Tags:Data Streams, Classification, Concept Drift, Random Forests
PDF Full Text Request
Related items