Font Size: a A A

The Research On Massive And Dynamic Data Stream Classification Method

Posted on:2014-02-05Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y YaoFull Text:PDF
GTID:1228330395999225Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Data classification problem is a classic and important issue in data mining area, which is still paid much attention. However, due to the development of internet of things and the ear of "BIG DATA" coming, the traditional data classification methods are facing new challenge, one of which is the change of data form, from static data form to dynamic data (data stream) form. Comparing with the static data, dynamic data has three features, i.e., large scale, real time and dynamic change (concept drift). These features make the data stream classification problem more complex than the static data. Therefore, how to design a new classification model, which is suitable for data stream features, becomes a research hot pot in data mining area. This paper mainly focuses on the data stream three features in classification problem.For the three features of data stream, in this paper, three targeted data stream classification methods are proposed:(1) For data stream large scale problem, a new ensemble learning based data stream classification model is proposed, which uses multi-classifier to classify stream data;(2) For data stream real time problem, a new incremental learning based data stream classification model is proposed, which uses rotary structure to reserve multi-classifiers and each classifier updates itself using incremental learning method;(3) For data stream concept drift problem, a new concept drift detection based data stream classification is proposed, which combines the concept drift detection and classifier into data stream classification process, and improves the accuracy of classification. In addition, we apply the proposed classification method into the smart grid abnormal detection and power supply system. The above mentioned content would support the research of data stream classification in feature. The mainly specific works are as follows:(1) For the data stream large scale problem, which would make the data environment more complex and the traditional classification method cannot adapt for this. A new ensemble learning based data stream classification method is proposed, which employs Support Vector Machines (SVM) as basic classifier, using different kernel function to construct different classifiers. Then, using Self-organizing Map model to cluster the classification results and finally obtain the classification result. Finally, the experimental result validates the validation of the proposed model.(2) For data stream real time problem, inspired with the ensemble learning, we design a rotary framework based data stream incremental learning classification model. The proposed model combines multi-SVM-classifier into a rotary framework, and the used SVM classifiers employ incremental learning method to update themselves. By controlling the number of samples in train dataset, the proposed model only learns the unlearned samples and the learned samples would not be learned again. The number of samples in train dataset would be decreased significantly and the speed of classification process would be increased at the same time, which fulfills the need of data stream real time classification demand.(3) For data stream concept drift problem, a concept drift detection based data stream classification model is proposed, which solves the general data stream cannot adapt concept drift problem. Before classification, the proposed model employs information entropy method to measure whether or not concept drift is occurred. In addition, the concept pool mechanism is also proposed, which can reserve the history occurred concepts and improves the ability to resist concept drift of the proposed model. On this basis, a concept drift visualization method is proposed, which can show the relationships between various concepts and can improve the ability of analysis and understand for concept drift.This paper mainly studies the three features of data stream, i.e., massive, real-time and concept drift. In addition, this paper also explores three aspects, including classification results merging, incremental learning strategy and concept drift detection method. Overall, an ensemble model based SVM-SOM data stream classification model is proposed, which employs multi-SVM based-classifier to classify data stream and the classification results are combined using SOM model; a rotary structure based incremental learning data stream classification model is proposed, which employs incremental learning method to update each classifier, decrease time-cost of retraining and improve real-time classification efficiency; a concept drift detection method based data stream classification model is proposed, which combines concept drift detection method into classify process, improves the anti-concept drift ability of classifier. These studies can significantly improve data stream classification efficiency, and provide some useful references for data stream classification problem in future.
Keywords/Search Tags:Data Mining, Data Stream Classification, Ensemble Learning, IncrementalLearning, Concept Drift
PDF Full Text Request
Related items