Font Size: a A A

Research On Classification Algorithms Of Concept Drift And Imbalanced Data Streams

Posted on:2020-11-01Degree:MasterType:Thesis
Country:ChinaCandidate:J F GuoFull Text:PDF
GTID:2428330590971596Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
In recent years,with the rapid development of big data and cloud computing,a large number of data streams are continuously generated in Internet and other aspects.In order to obtain and analyze a large amount of useful information hidden in the data streams of these fields,scholars have carried out in-depth research on the field of data streams mining.However,static data and data streams are not exactly the same,data streams is characterized by rapidity,continuity,variability,infinity and so on.The different characteristics of data streams mining algorithm determines that the traditional data mining algorithm can not be fully used.Not only that,the concept drift phenomenon will occur in the data streams,that is,the distribution of data will change with the change of time,which imperceptibly causes great difficulty in data streams mining.Similar to static data,there is also class disequilibrium in data streams,which are the key and difficult problems that have to be faced and urgently solved in the process of stream mining.Therefore,this paper mainly focuses on the above issues and conducts in-depth research on the concept drift phenomenon and class imbalance in the data streams.Its main work includes:In view of the concept drift problem in data streams,this chapter introduces the concept drift detection algorithm based on data distribution,which is mainly divided into concept drift detection module and concept reproduction detection module.The algorithm can not only deal with the concept drift in the data streams,but also detect the problem of recurring concept.Firstly,the concept drift in the data streams is detected by using the concept drift detection algorithm.Finally,the algorithm is verified and analyzed on MOA.The results show that the algorithm has low false alarm,low false alarm and low detection delay,which not only effectively improves the performance index of classification,but also finds the problem of concept recurrence in concept drift phenomenon.Aiming at the classification problem of concept drift in data streams with class disequilibrium,this chapter proposes an ensemble classification algorithm based on ensemble learning for unbalanced data streams.The algorithm firstly deals with the class disequilibrium problem in the data streams,first adopts the up-sampling technology,then the down-sampling technology,increases the positive sample,reducesthe negative sample,reduces the overfitting,and balances the data streams.Secondly,the classifier weight is updated periodically in an integrated way to cope with concept drift.When dynamically updating the weight of classifier,not only the classifier's classification accuracy of the current data block is considered,but also the cost of classifier's misclassification of the current data block is introduced.In the strategy of classifier elimination,the contribution value of the classifier in the integrated classifier is calculated and the classifier is replaced according to the contribution value.Finally,a large number of validations and analyses are carried out on the data streams machine learning experimental analysis platform MOA,and the results show that the algorithm has a high classification accuracy.
Keywords/Search Tags:Data streams, Concept Drift, Ensemble algorithm, Class imbalanced
PDF Full Text Request
Related items