Font Size: a A A

Study Of Classification Algorithms For Skewed Data Streams Based On Ensemble Framework

Posted on:2013-12-20Degree:MasterType:Thesis
Country:ChinaCandidate:J ZhangFull Text:PDF
GTID:2248330377960731Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Classification on data streams has become a hot research of data mining in recentyears. However, the class distributions of data streams in real applications are usuallyunbalanced. These data streams are hence named as Skewed Data Streams (denoted asSDS), that is, the number of some classes (small class, positive class) is significantly lessthan others’(major class, negative class). The positive class of SDS is the concerned object,but its classification accuracy is much less than the user’s requirement. That is because thenumber of positive instances is too small to train a good enough classifier when usingtraditional classification methods. To improve the classification accuracy of positive class,we study of efficient and high-performance classification methods on SDS taking intoaccount the real-time requirements of algorithms in the data streams environment. On thisbasis, in terms of the universality of concept drift existing in SDS, we further research onthe methods to detect and adapt the concept drift in SDS. The main contribution is follows:(1) The definitions, the key issues and evaluation criteria, as well as related handlingmethods of data streams have given here, and an overview of classification problemapproaches and resent research of skewed data streams.(2) It requires not only good classification accuracy but also high performance ofclassification on SDS. So we propose an ensemble method called ECSDS to classify SDSin this dissertation. It utilizes the value of F1-value as the threshold to decide the classifierupdating. It could decrease the updating frequency and hence reduce the time overhead. Inaddition, we add misclassified positive instances in the training data set to improve theclassification accuracy when updating the classifiers. Extensive experiments show that ourECSDS method performs better on the time overhead and prediction accuracy on positiveinstances.(3) In terms of the concept drift existing in SDS, a new concept drifting detectionmethod (named CSCEP) in SDS is further proposed in this dissertation. CSCEP adoptsinterval estimation in the theory of probability theory to detect concept drift in SDS.Meanwhile, in order to adapt the new concept, it introduces misclassifying positiveinstances to update classifiers. The experiments show that CSCEP can be more timelydetect of concept drift of SDS, and make the existing classification model update quickly toensure the good classification results.
Keywords/Search Tags:Skewed Data Stream, Classification, Ensemble Learning, Concept drift
PDF Full Text Request
Related items