With the rapid development of 5G applications and Internet of Things(Io T)technology,the application scenarios involved in data stream mining are becoming increasingly diverse.For example,there is a growing demand for data stream mining in fields such as agricultural Io T data mining,network data stream intrusion detection,and recommendation systems.Compared to traditional static data,evolving data streams typically exhibit the following characteristics:(1)Data streams are characterized by strong real-time requirements and large volumes of data,which lead to high costs and difficulties in data labeling;(2)Data stream distribution exhibits dynamic and evolving characteristics,making it difficult for traditional models to adapt to the dynamic changes in data distribution;(3)The dynamic class imbalance problem,where the proportion of imbalanced categories in the data and the categories themselves may change over time as the data stream evolves.While there have been many studies addressing these issues,there are still some problems and shortcomings in these works,including:(1)Most of the research directly learns from the raw data,lacking a more detailed description of the data,and the raw data may not satisfy the assumptions of semi-supervised learning;(2)When using unlabeled data for training,there is a lack of further analysis on the reliability of the samples,which may lead to unstable and unreliable algorithms;(3)The existing research has poor flexibility in detecting class imbalance,and may not effectively enhance the expression ability of minority classes.To address the aforementioned issues,this study proposes the following main innovations:Firstly,to address the issue that the raw embedding of data streams may not satisfy semi-supervised learning assumptions,this study proposes a deep metric-based multiproxy semi-supervised class-balanced embedding learning algorithm.The algorithm learns a class-balanced low-dimensional embedding through an end-to-end class-balanced semisupervised network.Then,based on the class-balanced embedding,multi-proxy metric learning is proposed to make the final learned embedding more discriminative and more in line with semi-supervised learning assumptions,thereby improving the reliability of the data stream semi-supervised model.Secondly,to address the issue of reliability in semi-supervised data stream algorithms under concept drift,based on the aforementioned embedding,this study proposes a dynamic micro-cluster-based reliability maintenance strategy.The algorithm maintains a reliability value for each micro-cluster and only uses instances with predicted reliability above a certain threshold to train the model.Additionally,it updates the reliability of neighboring micro-clusters based on the local distribution consistency of labeled data.The algorithm also maintains the latest unlabeled micro-clusters and utilizes a reliable unlabeled micro-cluster labeling propagation method.Thirdly,to address the issue of inflexible detection and inadequate representation of minority classes in data stream classification,this study proposes a method for detecting and handling class imbalance based on reliable micro-clusters.By utilizing micro-clusters to calculate the number of instances for each class,the proposed method also introduces an approach based on artificial minority class oversampling.When class imbalance is detected,the method generates a certain number of minority class samples to improve the model’s ability to learn from the minority class.Through extensive experiments on both real and artificial datasets,the results show that the proposed algorithm can effectively improve the reliability and effectiveness of the model. |