Font Size: a A A

Concept-Drifting Detection And Classification In Nonstationary Data Streams

Posted on:2014-01-19Degree:MasterType:Thesis
Country:ChinaCandidate:Y P ZhangFull Text:PDF
GTID:2248330398478331Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the advent of information era, people conscious of the significance of information and start to hunt for gold by kinds of way. Especially, getting information from data is a most precise method. The store and management of data and the data analysis get into trouble for the kinds of sources, the imperfection and the inflated feature, especially for the appearance. The contradiction between the eager for information of data and the awful technology is more and more clear, and lead to the scene looking at piles of data helplessly. The data streams which contain lots of concept drifting, noise and unbalance distribution are named as nonstationary streams. And the paper focuses on concept drifting and classification in the nonstationary streams.Firstly the background and the significance of data mining and challenges of classification are stated; then that methods dealing with the concept drifting are summarized in detail from points of detection and processing; classifications in different aspects are discussed from points of single and ensemble classifier; lastly problems of those methods are summed up, and breaks are proposed.After researching of methods of concept drifting processing in distribution, summing up the principle of detection using martingale based on statistics, and proposing the method of Concept Drifting Detection Based on Martingale (CDDBM). In the method the centre and radius of aggregate of data all can trigger concept drifting and the strangeness measure of data point is redefined; a good statistical approach using double power martingale is proposed. In addition, controlling reasonably the threshold of the accumulate and the size of window for detection make the method getting a good effect in theory and experiments, and reducing the false alarm rate and the losing alarm rate.In order to classify well for big data streams and construct an effective classifier, the paper according to the theory that different feature has different critical degree for classifying, proposes a method of Ensemble Classifier for Feature Drifting (ECFD). Firstly present a method of Unsupervised Feature Filter (UFF) based on mutual information and judge the occurring of concept drifting when two critical feature set appear, then construct classifier on the feature data set and ensemble. Lastly a good weight method has been proposed for voting for classifying. The method has strong appearances in accuracy, speed and noise immunity from theoretical analysis and an abundant of results of experiments.
Keywords/Search Tags:Nonstationary data streams, Concept drifting, Feature selection, Featuredrifting, Ensemble classifiers
PDF Full Text Request
Related items