Font Size: a A A

Research On Concept Drift Detection In Data Stream And Classification Algorithms For Imbalanced Data Stream

Posted on:2018-05-04Degree:MasterType:Thesis
Country:ChinaCandidate:Y BaiFull Text:PDF
GTID:2348330512979388Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology,a lot of streaming data have been generated in the scenes of credit card fraud monitoring,network traffic monitoring and online financial transactions.These data contain a lot of valuable information.In order to obtain the information,researchers are dedicated to the research on data stream mining.Different from the static data,there are three main characteristics of streaming data,i.e.large scale,diverse and high speed.Therefore,traditional data mining technology cannot be fully applied to data stream mining.The streaming data need to be dealt with in proper ways.In addition,the distribution of streaming data will change over time,which leads to the phenomenon of concept drift.Concept drift also increases the difficulty of data stream mining research.The drift detection in data stream and classification for data stream has become the heated topic in the field of data stream mining.There are two major challenges for these fields:First,steaming data generated with high speed and they varied over time,which is unpredictable.These changes will affect the performance of classifier.Second,streaming data also has the problem of class imbalance,which further increases the difficulty of solving the concept drift problem.Moreover,the cost of misclassifying a minority sample is usually high,which imposes higher requirements on classifier.This dissertation will focus on the above issues,and study on the drift detection method and classification method for data stream.The main work includes:(1)A drift detection algorithm based on data distribution is proposed.This algorithm detects concept drift in data stream according to the difference between two data distributions.After detecting the concept drift,multivariate statistical test and the stored historical information are used to detect the recurrent drift in data stream.The performance of the proposed algorithm is verified by the comparison experiments.The results show that the algorithm has low false positives,low false negatives and low detection delay.After combining with the classifier,the classification accuracy of the classifier is effectively improved,and the recurrent drift can be detected.(2)A classification algorithm based on ensemble learning for imbalanced streaming data is proposed.The algorithm uses the under sampling and over sampling techniques to balance the positive and negative samples.Weights of the base learners are determined by current classification accuracy and the cost of misclassification.At the same time,the contribution of base learners to the accuracy of ensemble is also considered in the process of base learner pruning.The algorithm not only can deal with the problem of class imbalance,but also can adapt to drift in data stream.The comparison experiments show that the proposed algorithm performs well in imbalanced streaming data.
Keywords/Search Tags:Data stream, Concept drift, Ensemble algorithm, Class imbalance
PDF Full Text Request
Related items