Font Size: a A A

Research On Ensemble Classifier Model For Data Steam Mining

Posted on:2014-01-22Degree:MasterType:Thesis
Country:ChinaCandidate:J B ZouFull Text:PDF
GTID:2248330395492783Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
The rapid development in the field of manufacturing control, wireless communication networks, e-commerce transactions, financial information monitoring form the high speed, massive and dynamic data stream. Due to the massive, dynamic nature of data stream, the traditional classification algorithm has been unable to reach the data stream processing requirements, effective data stream processing and mining valuable information has become a hotspot study at home and abroad. Meanwhile, the rapid development of cloud computing which designate a new direction for processing massive, continuous and high speed data stream. How to combine powerful computing capacity in the cloud, fast and efficient processing of data stream will become a major trend of the future information on processing.The research of data stream mining mainly concentrates on the data stream of frequent pattern mining, the dynamic classification of data stream mining and the evolving cluster of data stream mining. This thesis conducted an in-depth study on integrating ensemble classification model and cloud computing technology. The main contents of this thesis are listed as follows:First, due to the characteristics of the data stream changes over time result in the target classification model change concept drift problem, this thesis proposes a new method which adds scenario characteristics analysis to the ensemble classifier and adopts information gain method to extract scenario characteristics. In addition, the threshold of the scenario characteristic is set dynamically to predict the occurring of concept drift. When the variation of scenario characteristics exceeds the scenario threshold, the ensemble classifier is stimulated immediately to create a new base classifier rather than waiting until the accuracy of the base classifier is below the given threshold, which makes the ensemble classifier capable of feed-forward learning. In this work, the proposed OCEC (Origin Characteristics Ensemble Classifier) model is validated by several computational experiments and it has been proved that OCEC can reduce the ensemble generalization error for mining concept drift data streams, and improve the effectiveness of concept drift detection.Second, deep research and analysis of integrated classification model on selection relation problem of ensemble classification models between accuracy and diversity in the final integration. Thanks to the potential infinity and rapid change of data stream, base classifiers must be update frequently in order to adapt the continuous changes of data categories. But base classifier for classification may exist redundant, that is only a base classifier can complete the task of data classification correctly has generated more than one base classifier. So this thesis used diversity measure in the base classifier selection of ensemble classification models and proposed a new incremental ensemble classification based on the information entropy diversity measure (Increment Select base Classifier in Ensemble Classifier, Increment_SEC), by introducing the measure of diversity to makes data stream processing model more adaptive.Third, deep study on the current tread of cloud computing technology. Combined with its advantage in massive data processing to improve the classification models by using MapReduce based on ensemble classification models, what mainly aiming at the defect of most existing ensemble classification algorithm only suitable for small scale and low dimension data stream, so the study proposed a parallel ensemble classification algorithm based on the MapReduce technology (Ensemble classification using MapReduce, EMapReduce), to parallel processing data stream through analysis the characteristic of ensemble classification models.
Keywords/Search Tags:Data Stream Mining, Scenario Characteristics, EnsembleClassifier, Concept Drift, Cloud Computing
PDF Full Text Request
Related items