Font Size: a A A

Research On Active Learning Method For Imbalanced Data Streams

Posted on:2024-04-27Degree:MasterType:Thesis
Country:ChinaCandidate:L RenFull Text:PDF
GTID:2568307115457664Subject:Software engineering
Abstract/Summary:
In the real world,data in many applications(such as online shopping,stock markets and healthcare and so on)appears in the form of streams.As an indispensable part of daily life,data streams are real-time,continuous,time-series,changing and infinite compared with traditional data.Therefore,it becomes increasingly important to extract meaningful information from data streams in an effective way.While data stream classification is an important research task in the field of data stream mining,the goal is to capture the changing class structure from the massive amount of data that arrives in real time and is constantly changing.Traditional data streams classification algorithms use a supervised learning approach that ignores the high cost of labeled samples;it does not consider that the class distribution in multi-class imbalanced data streams changes over time;and causes the original classifier to become inapplicable when concept drift occurs;However,the process of data stream classification also faces problems such as too many outliers that make it difficult to select training samples for model reconstruction;the original classifier cannot adapt to the new class after the conceptual evolution of the data stream occurs.These problems are the reasons for the degradation of classification accuracy of data streams.In response to the above problems,an online active learning method for imbalanced data streams and an online active learning method for conceptual evolutionary data streams are proposed.The specific work in this paper is as follows:(1)The background significance of data stream classification,the current state of research,the characteristics of data streams and the many problems of class imbalance,concept drift and concept evolution encountered in the process of data stream classification are introduced,and the formal definition of the problem is given.(2)An online active learning method for imbalanced data streams is proposed to address the problems of multi-class imbalance,concept drift and high cost of labeled samples.Among them,a sample initial weight definition method containing an adaptive forgetting factor is proposed,thus making the Ada Boost.M2 method applicable to imbalanced data streams;a marginal threshold matrix adaptive adjustment method based on the degree of uncertainty of sample classification is proposed,an active learning model based on a hybrid label request policy is constructed,and an active learning framework for imbalanced drifting data streams is proposed;a classification bias-based conceptual drift index and introduces it into the time decay mechanism,which can be used for model reconstruction.(3)A concept evolution data stream online active learning method is proposed for the problems of multi-class imbalance,concept evolution and too many anomalies.Among them,a distance-based outlier detection mechanism is proposed to construct a decision boundary by training samples and determine whether the input samples are outliers according to the decision boundary;a concept evolution detection mechanism is proposed to use multiple fixed-length sliding windows to store specific samples in combination with the mechanism of active learning of data streams,and periodically detect whether enough similar samples appear in the sliding window of outliers,thus solving the data stream The problem of concept evolution in the classification process is thus solved;the weight formula for sample retraining is redefined for the difference of the integrated classifier retraining task after concept drift and after concept evolution.(4)An active learning system for imbalanced data streams is developed.The system implements active learning of data streams,concept drift detection,concept evolution detection and outlier detection based on the relevant methods proposed in this paper.
Keywords/Search Tags:Imbalanced data streams, Active learning, Concept drift, Concept evolution, Outlier detection
Related items