Font Size: a A A

Research On Incremental Learning Algorithm And Application For Data Stream With Concept Drift

Posted on:2021-05-30Degree:MasterType:Thesis
Country:ChinaCandidate:B C ZhangFull Text:PDF
GTID:2428330614971256Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Many applications in the real world,such as sensor networks and social media,continuously generate a large number of real-time,fast-reaching data stream.Building a classifier incrementally based on this dynamic data stream is one of the popular issues in data mining.However,this dynamic data stream is usually unstable,that is,the distribution of data will change over time.This phenomenon is called concept drift.The existence of concept drift degrades the performance of the classifier,which brings challenges to the incremental learning of data stream.Although there has been research work dedicated to the learning of data stream with concept drift,there are still some problems that need to be solved: First,related research works usually aim at specific types of concept drift,while inadequate for the adaptation of concept drift data streams that have multiple evolution types.Second,there often exists a concept drift in feature space in the data stream.The dynamically changing feature space causes the decision boundary form to change constantly.The existing related methods are insufficient in computing efficiency and adaptability to this constantly changing decision boundary form.In order to solve the above problems,this thesis is devoted to the research of adaptive incremental learning in this complex concept drift environment.The specific works are as follows:(1)An incremental ensemble learning algorithm DAWE is proposed in this thesis to cope with data stream with mixed evolution types of concept drift.This method combines the classification accuracy and diversity of the classifier to calculate the value of the base classifier,and helps the ensemble model cope with multiple drift forms by comprehensively measuring the value of the base classifier from multiple angles.At the same time,in order to perceive the evolution of concept drift,this thesis quantitatively analyzes the concept drift degree based on principal component analysis to adaptively adjust the weight of the base classifier.The experimental results show that DAWE exceeds the current best related algorithms by 1.14% in average classification accuracy on the dataset containing multiple types of concept drifts.(2)An online incremental learning algorithm OLLDF is proposed in this thesis to cope with the drifting data stream in feature space.The method consists of a global linear classifier and multiple local linear classifiers,which are used to fit the dynamic patterns of decision boundary.By dividing the feature set of the instance and synthesizing the divided features for online learning,the classifier has the ability to learn in dynamic feature space.In order to improve the efficiency,a variance-based method is designed for the calculation of feature set weights when the linear model is online optimizing.The experimental results show that the accuracy and efficiency of OLLDF in the dynamic feature space are significantly better than the state-of-the-art related algorithm,and the accuracy is 9.93% higher than that of the latter in average.(3)In order to verify the application value of the proposed algorithms,this thesis applies DAWE and OLLDF to the abnormal user identification task in the keystroke data stream scenario.Two feature extraction methods are designed to obtain the single-keystroke features and sequence-keystroke features.On this basis,abnormal user identification models are learned incrementally based on DAWE and OLLDF respectively.The experimental results show that,in keystroke data stream scenario,both DAWE and OLLDF have shown good performance in accuracy and F1.The works enumerated above show that the algorithms proposed in this thesis can effectively deal with the concept drift of mixed evolution types and the concept drift of feature space changes,and are valuable to real-word application.
Keywords/Search Tags:Data Stream, Classification, Concept Drift, Incremental Learning, Ensemble Learning, Online Learning
PDF Full Text Request
Related items