Font Size: a A A

Research On Online Active Learning For Class-imbalanced Data Stream

Posted on:2021-01-20Degree:MasterType:Thesis
Country:ChinaCandidate:Y F ZhangFull Text:PDF
GTID:2428330611465686Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Online learning is a type of machine learning paradigm for time-series mining,and has been successfully applied to handle various time-series classification tasks.However,time-series classification tasks usually suffer from two severe challenges in fact:(1)Class imbalance problem:the data number of different classes varies significantly.Severe class imbalance may make the online classifier fail to classify the minority data.(2)Small sample problem:the amount of labeled data can be limited in practice.Unfortunately,the limited amount of labeled data may make online classifiers fail to model the data distribution,especially when facing the class imbalance problem.These challenges make most online learning algorithms fail to handle real-world time-series classification applications.In order to solve these challenges,we seek to study class-imbalanced data stream in terms of two aspects,i.e.,online classification and online active learning.For the online classification problem,we propose a new cost-sensitive online classification algorithm.The algorithm sets different misclassification costs for different categories and uses the second-order information of samples to adaptively adjust the learning rate.As a result,this algorithm is able to improve the model's convergence rate and handle class-imbalanced online classification better.Secondly,for the problem of online active learning,we propose a novel online adaptive asymmetric active learning algorithm.The algorithm sets different class weights for both model update and sample query,so it can effectively distinguish the importance of the minority data.At the same time,by exploring the second-order information of samples,the algorithm has higher query credibility and faster convergence speed.In other words,it is able to make more accurate query decisions and quickly adapt to distribution shifts in time-series data.Therefore,the proposed method handles the problem of online active learning better.To verify the proposed algorithms,we conduct extensive theoretical analyses and empirical studies.Excellent theoretical and experimental results demonstrate the effectiveness and superiority of the proposed algorithms.In addition,the results also verify two important expectations.(1)The second-order information of samples and the cost-sensitive objective function are beneficial to handling class-imbalanced online classification;(2)Online active learning should pay attention to class imbalance and second-order information in both model updating and label query.They help to improve the reliability of query decisions,accelerate the model's convergence,and improve the algorithm performance in online active learning problems with class imbalance.
Keywords/Search Tags:Online Learning, Class Imbalance, Cost-sensitive Learning, Active Learning
PDF Full Text Request
Related items