Font Size: a A A

Study On Data Streams Online Classification Algorithm Of Adapting To The Concept-Drift

Posted on:2013-07-19Degree:MasterType:Thesis
Country:ChinaCandidate:Y MaFull Text:PDF
GTID:2298330467474663Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
The technology of classification, with a wide range of applications, is an important part for data mining, and as a result of this, so far a large number of mature research results have developed. Data streams, as new data model, are continuous, single-pass scanning, rapidly changing and infinite. In fact, the knowledge or concepts for the data in the data streams can change over time, which may lead the concept drift. For the traditional data mining classification algorithms, it is difficult to cope with this timely and effectively. So the problems for how the classification of the data streams is made quickly and timely, and noticing and adapting to the concept drift effectively have brought great challenges to the research on date mining. And the classification of data streams has become one of the hot issues of data mining research.Based on this, the studies made for the classification of data streams with the property of concept drift are as follows.Firstly, the knowledge related to data mining is learned and the background and characteristics of the data streams are understood as well. In addition, the core idea of the current classification algorithm for data streams is studied and mastered.Secondly, for the classification of data streams with the property of concept drift, an Examples of Weighted Based Classification Algorithm for Concept Drifting Data Streams (EWBC) is designed. The algorithm introduces the idea of boosting. It adjusts the weights of the training data sets dynamically according to the classification results of each basic classifier so that the newly created classifier is able to converge to the new concept more accurately when the concept drift occurs; at the same time, when removing the old basic classifiers no longer adapted to the new concepts, the accuracy of classifiers and differences between the classifiers should be considered. As the experiments show, the accuracy of the EWBC algorithm has significantly improved compared with the WEC algorithm.Lastly, for the data streams classification with the recurring concept and the cyclical concept drift, a Historical Concept Based Classification Algorithm for Data Streams (HCBC) is proposed. The algorithm saves all the produced concepts and its corresponding classifiers. And according to the information for the historical concepts, a set of basic classifiers, when the historical concepts emerge again, are firstly selected as the ensemble classifier, which improves the speed of classification. If the results of a classification do not reach the required accuracy, then the new classifier will be trained to update the ensemble classifier. Finally, a large number of experiments have proved the effectiveness and efficiency of the proposed algorithm.
Keywords/Search Tags:data streams, classification, concept drift, ensemble classifier, historical concept
PDF Full Text Request
Related items