Font Size: a A A

Research On Data Stream Classification Algorithm Based On Ensemble Learning

Posted on:2021-01-29Degree:MasterType:Thesis
Country:ChinaCandidate:Y H GuoFull Text:PDF
GTID:2428330620463408Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In the context of rapid social development and rapid increase of information,data in various fields have shown explosive growth,and the incremental model of data has brought the world to the era of big data.How to mine effective information from massive data has become an important concern of current social science.Classification technology is one of the main methods in the field of machine learning and data mining.The main idea is to predict the tags of unknown data based on existing data and tag training classifier.The traditional classification method is used to deal with static data,and the classification model constructed is fixed,which is not conducive to dealing with dynamic data stream.The data stream has the characteristics of infinite number,fast arrival,timely response,concept drift,imbalance and so on.it is the focus of current classification research how to deal with the characteristics of data stream so as to correctly classify it is the focus of current classification research.In order to adapt to the characteristics of the data stream changing at any time,the data stream classification model should also make corresponding changes according to the data changes and constantly adjust.There are problems of concept drift and imbalance in the data stream.How to effectively detect the concept drift and imbalance,and how to deal with these two phenomena are the problems that need to be solved in the classification of the data stream.Therefore,according to different data stream processing methods,this paper makes in-depth research on data stream classification.The main contents include the following two aspects:(1)The data stream is divided into blocks,and the window mode is adopted to measure the classification ability of the classification model for the two data blocks before and after.A dynamic data stream classification algorithm based on the combination of Kappa coefficient concept drift detection and SMOTE sampling method for class imbalance is proposed.Thealgorithm calculates the Kappa coefficient of the classification result of each data block during the classification process,then detects whether the concepts before and after the data stream are consistent,and whether the concept drift occurs.When the concepts in the data stream change,it indicates that data imbalance may occur.If data stream is imbalanced,use SMOTE sampling method to balance the data,and the system will eliminate in time all classifiers that do not meet the requirements according to the existing knowledge.Use the sampled data to train a new classifier and add it to the classifier integration.The experimental results show that the classification effect of this algorithm is significantly improved compared with similar algorithms.(2)To process data streams online in a timely manner,a data stream classification method based on online learning is proposed.The algorithm processes the data in time.It uses the Online Bagging integrated classifier and uses the Possion distribution to change the number of classifications of each data to achieve the effect of updating the classifier.Similar to the first part,the Kappa coefficient is used to detect whether the concept has changed,the attenuation factortw is updated according to the classification result,and the Possion distribution is used to obtain the number of classifications to achieve the effect of resampling,thereby handling the imbalanced problem.Experimental results show that the algorithm can not only detect concept drift,but also improve the classification effect.Aiming at the concept drift and imbalance in the data stream,this paper proposes two data stream classification algorithms based on ensemble learning,which not only effectively detects the concept drift in the data stream,but also improves the classification accuracy of minority class.However,these two methods also have some limitations.They are more dependent on the internal structure of the data set.How to effectively combine concept drift detection and imbalance processing remains to be further studied.
Keywords/Search Tags:Data stream classification, Ensemble classifer, Concept drift, Imbalanced, Sampling
PDF Full Text Request
Related items