In recent years,with the rapid development of computer technology,more and more fields have generated a steady stream of data streams,such as the Internet of Things,cloud computing,smart home,and financial services.Private networks also generate massive data streams,such as satellite-based Earth survey data,radar-derived weather monitoring data,and atmospheric radiation measurements.These data are different from traditional static data,but a new type of data object has the characteristics of dynamic,continuous,fast,and fleeting.Based on these characteristics,the analysis and processing of data streams are facing enormous challenges,which are the current research hotspots in the big data processing field.As an essential task and topic in big data and machine learning,classification problem research(fault detection,credit card fraud detection,medical diagnosis,etc.)aims to extract models describing essential class labels from a large amount of data and predict future data dis-tribution trends.Traditional data classification(such as conventional neural networks,support vector machines,and decision trees.)is a supervised learning method.The model can achieve desired classification performance by fitting the classifier’s parameters based on the labeled sam-ples.These traditional classification algorithms are usually suitable for small static data sets,which need to obtain all the data in advance,and need to scan the data multiple times.How-ever,the data stream in practical application scenarios is a real-time,continuous,dynamically changing,infinite sequence.In addition,the potential problem of concept drift,class imbal-ance,and missing class labels in the data stream also pose challenges to the existing data stream classification algorithms.Therefore,this topic aims at the problem of data stream classification with concept drift and explores the design of efficient data stream classification algorithms.The main innovations are as follows:(1)This paper proposes a new dynamic incremental ensemble classification algorithm(DIEFC)for data stream classification problems with concept drift that combines fuzzy self-organizing classifiers with dynamic weighting algorithms.First,we train multiple classifiers based on fuzzy self-organizing classifiers to form an ensemble classification framework for dealing with dynamic data stream classification problems.Secondly,we improved the DWMIL algorithm to adaptively assign weights to the classifier,increase and decrease the classifier dy-namically,and use the weighted voting mechanism to output the results.Next,We added a data sampling module to the ensemble model to dynamically update the previous base classifier incrementally,thereby improving the classification performance of the entire ensemble model and better adapting to dynamic data streams.Finally,We conduct comparative experiments in industrial Io T and on twelve artificially synthesized datasets.Compared with the compar-ative methods,DIEFC improves the AUC and G-mean indicators by 5%,respectively.For all datasets,the average training time of the DIEFC method is 50% faster than the compari-son method.These validate the classification performance and speed advantages of DIEFC and prove that it is more suitable for classification problems in Industrial Io T than similar data stream classification methods.(2)This paper proposes a dynamic integrated Lightgbm algorithm based on oversampling to deal with anomaly detection in the Io T data stream.Regarding data balance,the Borderline SMOTE method generates minority samples in the training set so that the various types of data in the training set reach a balanced ratio to alleviate the problem of data imbalance.We employ datasets in real application scenarios to evaluate the anomaly detection performance of the pro-posed BSDWLGB model.We also compare the latest proposed anomaly detection algorithms and verify that the proposed model is more suitable for anomaly detection in Io T scenarios than similar data stream classification methods.(3)This paper proposes a random undersampling algorithm based on bagging to solve the imbalance between abnormal and normal samples.To reduce detection time and consider the dynamics and continuity of data,we propose an adaptive ensemble random fuzzy(AERF)algo-rithm with random fuzzy rules(RFRB)and a dynamic weighting strategy.AERF enhances the diversity of the classifier and improves the detection performance of the entire ensemble model.We use two real and five benchmark datasets to evaluate the anomaly detection problem of AERF in a cloud computing environment.We also compare it with recently proposed anomaly detec-tion algorithms.Experimental results show that AERF is more suitable for anomaly detection problems in cloud computing than other data stream classification methods.(4)To address the problem of missing class labels in the data stream,we propose an incre-mental dynamic weighted semi-supervised learning method(DWSSL)to consider the missing class labels in the data stream.We combine the fuzzy rule-based(FRB)method with the dy-namic weighting algorithm and propose the dynamic weighted semi-supervised classification(DWSSL)algorithm,which enhances the relationship between the classifiers and improves the classification performance.In order to make full use of the limited labeled data and further improve the model’s performance,we use incremental learning to update the ensemble model incrementally.We use six real-world datasets to evaluate the proposed method for the data stream classification problem in the Io T environment.We also compare with semi-supervised and supervised data stream classification algorithms,verifying that the proposed method is more suitable for classification problems in the Io T environment than similar data stream classifica-tion methods. |