Font Size: a A A

Research On The Classification Methods For Dynamic Data Stream

Posted on:2018-01-01Degree:MasterType:Thesis
Country:ChinaCandidate:S L XuFull Text:PDF
GTID:2348330521950102Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the development of information society,data stream in many areas is generated with an explosive increase.Many areas have produced massive stream data.How to get valuable knowledge and patterns from massive data is a hot topic which has attracted many researchers' attentions.Classification is an important data analysis method;it uses existing data to predict the class labels of unknown data.In traditional classification algorithms,when the classifier is trained,the classification model is fixed and no longer adjusted.It is obvious that this classification model is unable to cope with data stream with dynamically changing.Data stream is different from static data which often has the characteristics of unlimited number,fast arrival and concept drift,so mining data stream needs a new algorithm framework.The data stream reflects the data real-time information.Comparing with the traditional classification algorithms,the most significant characteristic of data stream classification is that it can adjust classifier with time going on.However,because of existing concept drift in data stream,how to detect concept drift is a problem to be solved.So in this paper,based on the traditional classification framework,the paper makes a systematic research on data stream classification and has achieved the following results:1.A fast extreme learning machine algorithm for data stream classification is proposed.The algorithm uses a single hidden layer neural network ELM(Extreme Learning Machine)as base classifier,and online sequence learning mechanism is introduced to deal with data stream environment.Taking into account the problem that the number of hidden layer nodes of ELM is not easy to determine,so in this algorithm,it uses a fast binary search method to determine the number of nodes in hidden layer.Concept drift can be detected according to the change range of the classification accuracies of two adjacent blocks.The proposed algorithm solves the problems that the traditional neural network has slow learning speed and large time overhead,and can not be directly applied to data stream classification task.The experimental results showed that the algorithm can not only detect the concept drift effectively,but also achieve a high accuracy.2.According to the influence of different types of concept drifts in data stream on the performance of the classifier,this paper presents a dynamic extreme learning machine algorithm with adaptive adjustment mechanism.It gives a method to dynamically adjust the number of hidden layer nodes based on the accuracy of the classification results.When concept drift is detected,the new ELM is adapted to new data distribution by retraining ELM classifier.The experimental results showed that because of adaptive adjustment mechanism,this algorithm has the advantages of the original ELM algorithm and reduces the dependence on user experience.3.On the basis of the relationship between the change of the amount of information and the data distribution,this paper proposes an ensemble data stream classification algorithm based on information entropy called ECBE.ECBE trains multi classifiers in training phase and the weights of classifiers are determined by the change of the entropy values of classification results before and after.Hoeffding bound is used to determine whether concept drift happening or not.When concept drift appearing,the system will adjust classifier according to their weights.Comparing with the existing algorithms,the ECBE algorithm not only can effectively detect the concept drift,but also can achieve a better classification result4.Aiming at gradual concept drift detection,a classification algorithm combined with unsupervised learning for data stream is proposed.This algorithm is based on ensemble classification technique and attribute reduction was introduced in the classification process.By comparing the accuracies of classification and clustering results,it can judge whether concept drift occurring or not;so it avoids the above accuracies are only sensitive to abrupt concept drift.The experimental results show that the proposed algorithm achieves good results in both abrupt and gradual concept drift,and has a good robustness.5.In concept drift,it is the key to evaluate the classifier's ability to classify two adjacent data blocks.The Kappa coefficient is an important method to measure the consistency of two variables.In order to deal with the problem of data stream classification,a data stream classification algorithm based on Kappa coefficients is proposed.In this algorithm,the Kappa coefficient of each block is calculated in the process of classification,and then the Kappa coefficient is used to detect the change of the concept in the data stream.The experimental results show that the algorithm can adapt to the dynamic change of the data at a faster speed and it has obvious advantages in time consumption and classification accuracy.In this paper,to solve the problem of data stream classification containing concept drifts,a series of data stream classification algorithms were proposed according to unsupervised learning,Kappa coefficient,information entropy,ELM and ELM with double hidden layers mechanism respectively.The experimental results are also fully demonstrated that the algorithms not only can effectively detect concept drifts in data stream,but also can obtain good classification results.The research results in this paper have important theoretical significance and wide application value for the research of data stream classification methods.
Keywords/Search Tags:Data stream classification, Concept drift, Ensemble classification, Information entropy, Extreme learning machine
PDF Full Text Request
Related items