Font Size: a A A

Research On Data Streams Classification With Concept Drift

Posted on:2019-07-22Degree:MasterType:Thesis
Country:ChinaCandidate:J WangFull Text:PDF
GTID:2428330545491289Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Data classification problem is still be paid much attention to and is a classic and important issue in data mining area.However,with the development of Internet of things and the ear of "BIG DATA" coming,the traditional data classification methods are facing new challenge.one of which is the data format changed from traditional static data to dynamic data flow.Then data exists in a completely new data type and is widely used in various fields.How to mine valuable information in these data streams has become a hot issue in current research.However,the data stream has the characteristics of rapidity,infinity,continuity,variability and so on.Also,data stream with noise data,there is a concept drift phenomenon and a large number of class labels are missing,resulting in the traditional classification model are difficult to adapt the classification of dynamic data streams.At the same time,faced with the characteristics of data flow,most existing classification models are confronted with problems such as slow classification speed,and these classification models are all based on an assumption that the sample tags are known when the classification model is constructed.But in real life,the sample tags in the data stream are unknown,and the label sample class will consume a lot of manpower,financial resources,and time,so this assumption is generally not true in practical applications.Therefore,how to construct a classification model that satisfies the characteristics of data flow and can effectively classify the data flow has become a hot issue in the academic community.It is of great research significance andapplication value to carry out research on the concept drifting data stream classification method.Although some research results have been achieved in the research of data stream classification,there are still obvious problems in the study of data streams containing noise data and the existence of concept drift phenomena.In view of this,this dissertation focuses on the problem of dynamic data stream classification.Based on ensemble learning and selective ensemble learning,this dissertation studies the classification of data streams with implicit noise and concept drift.The main research contents are summarized as follows:1.This article outlines the basic concepts,research background and significance of data stream,and summarizes the commonly used processing methods of data stream mining.Also it introduces the concept drift in the data stream and the commonly used processing methods,as well as the problems that the current concept drift data stream classification still be faced.Finally,this dissertation discusses and analyzes the current data stream classification models and its classification characteristics in the environment of noise and conceptual drift,and summarizes the key issues that should be paid attention to constructing the concept drift data stream classification model,this will lay the foundation for the related research work of this dissertation.2.Ensemble learning is studied in depth in the dissertation.A data stream classification method based on classifier similarity weighting and difference integration is designed to solve the problem of data stream classification with hidden noise and concept drift.The dissertation uses the latest base classifier as a reference classifier to represent the coming concepts in the data stream,and based on this classifier,the similarity between base classifiers is calculated through Gower similarity coefficient,and weighted majority voting is performed by using thesimilarity as the base classifier weights.At the same time,the Q-statistic method was used to measure the differences between the base classifiers,and the differences was used as the base classifier's updated elimination strategy to improve the diversity of the ensemble classification model.Finally,simulation experiments show that the research ideas of the ensemble classification scheme are feasible and have good performance in classification accuracy and stability.3.Selective ensemble learning is summarized in this article.Considering that ensemble learning has some disadvantages such as large scale of integration,long training time,and high space-time complexity when constructing the classification model,the dissertation proposes a selective ensemble data stream classification method based on ant colony optimization.When selecting the base classifier,the method considers the classification accuracy and the difference of the base classifier and uses the optimization ability of the ant colony optimization algorithm to select the base classifier with high classification accuracy and individual diversity to build an integrated classification model.Finally,the integrated classification model is simulated on the standard simulation data set.The results show that the method has significant improvement in accuracy and stability compared with the traditional integration method.Finally,this dissertation gives a challenge to the classification of concept drift data flow,and briefly discusses the future trend of concept drift data flow classification.
Keywords/Search Tags:Data stream classification, Concept drift, Ensemble learning, Selective Ensemble learning, ACO Algorithm, Classifier diversity
PDF Full Text Request
Related items