Font Size: a A A

Research On Ensemble Classification Algorithm Of Data Stream In Nonstationary Environment

Posted on:2020-01-21Degree:MasterType:Thesis
Country:ChinaCandidate:H K MoFull Text:PDF
GTID:2428330623451389Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In some practical applications data generated over time can be seen as data stream from nonstationary environments.In such data stream the distribution of data is evolving and this phenomenon is also known as concept drift.In research usually adopt ensemble classification algorithm to classify data stream because the constructed ensemble classifier in the process of processing data stream has a modular structure which offers a nature way to accommodate to data distribution changes in data stream.There are usually two way to process data during constructing ensemble classifier.One is using online learning method to process data one by one and the other is dividing the data into the same size chunks,each of which is a process unit.Based on these two methods,the ROAUE(Recalling Online Accuracy Updated Ensemble)algorithm and the MAUE(Memorizing Based Accuracy Updated Ensemble)algorithm are proposed respectively in this paper.The ROAUE algorithm is an online ensemble classification algorithm which uses the past knowledge to update the ensemble classifier.It processes data one by one and every other window size adds a new classifier trained on the latest data in the window.Once the number of base classifiers reach the set upper limit,one of base classifiers will be selected and replaced by the newly trained classifier.During the selection of base classifier,not only consider the current weight of each base classifier but also measure its current past knowledge,which is calculated by storing some former weight values of this base classifier and using these stored values to multiply a sigmoid function to get a weighted value.Then combine these two values of each base classifier and the poorest one will be selected.By this way,the performance of each base classifier can be more comprehensively evaluated during the selection of base classifier.The experimental results on different data stream datasets show that ROAUE algorithm can better react to disturbance in data stream and improve the classification accuracy further than other 4 representative online ensemble classification algorithms.The MAUE algorithm is an ensemble classification algorithm which uses forgetting mechanism to update the ensemble classifier and constructs the ensemble classifier based on data chunks.When process a data chunk,a newly trained classifier on this data chunk is added to the ensemble classifier and the weight of each base classifier in the ensemble classifier is calculated according to its classification performance on the current data chunk.Based on the weight of each base classifier,select a certain proportion of base classifiers to update their parameters which are used to calculate the memory intensity of each base classifier according to the Ebbinghaus forgetting curve.When the number of base classifiers reach the set upper limit,choose a base classifier whose memory intensity value is the lowest to be replaced by the newly trained classifier.Finally,the experimental results on different data stream datasets show that the MAUE algorithm can further improve classification accuracy and has obvious advantages in dealing with fast sudden recurring concept drift compared to other 4 representative classification algorithms.
Keywords/Search Tags:Data stream, Concept drift, Ensemble classifier, Nonstationary environment
PDF Full Text Request
Related items