Font Size: a A A

Research On Classifiers For Data Streams Based On Active Learning

Posted on:2018-07-19Degree:MasterType:Thesis
Country:ChinaCandidate:X X ZhangFull Text:PDF
GTID:2348330542460056Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Along with the rapid development of information science and technology,a variety of things have been integrated into and affected our daily life,such as the Internet of things,smart city,social networking,precision medicine and so on.So far,Internet is not only a source of information,but is also the platform of dissemination and re-processing of information.And the phenomenon of information flooding and data explosion can be heard without end.The data which occur.widely in the fields of finance,transportation are called stream data or data stream.In the classification task of data stream,we not only consider the characteristics of data stream,for example,every instance rapidly arrives according to the time sequence,but also need to consider the concept drift that is a huge difference from the traditional data.Hence,the classification of data stream,as a branch of data mining,is being paid more and more attention and research.Clearly,classification algorithm is a supervised learning algorithm.However,in real data stream setting,data with class labels is rare.In order to improve the generalization ability of the classifier model,we need to make use of unlabeled instances,and even query the class labels of some historical data.Because of the concept drift in data stream,it is necessary to update classification model by asking true labels of some representative and informative instances.However,manual marking is often time-consuming and expensive.In view of these situations,the main work and contributions in this paper are as follows:(1)Classification of data stream and active learning methods.This paper mainly introduces the characteristics,processing model and methods of data stream.Then,it introduces the classical single classifier algorithm and ensemble classifier algorithm which is proposed in data stream setting.Next,the article points out that real data stream lacks enough labeled instances,and presents the concept of active learning,the traditional active learning model and incremental active learning framework which is adapted to the environment of data stream.At last,this paper introduces three criteria for of active learning about selecting instances and several kinds of stream-based active learning strategies.(2)Stream-based active learning strategies with evidence.According to the objective situation of insufficient labeled instances in real data stream environment,on the basis of the analysis of the existing stream-based active learning strategies,Esplit strategy is proposed and compared with the Split method,further considers the reason of uncertainty of unlabeled instances in the boundary region.Because of single scan and dynamic of data stream,this paper proposes a weighted evidence,and uses window mechanism and dynamic threshold for selecting some uncertain instances with high evidence value,and ensuring the high value of the artificially marked instances.At the same time,we use the random strategy to save the current distribution of data stream,and find the concept drift.(3)Classifiers for data streams based on active learning.In this paper,the EDCA algorithm is proposed based on the combination of ensemble classifier model and the Esplit strategy.The Esplit,as a stream-based strategy,will be adjusted to block-based active learning strategy.First,EDCA algorithm can preserve the distribution of a block by the random strategy for obtaining an objective estimation about weight of every child classifier,and improve the generalization ability of the classifier through the evidence-based uncertainty sampling strategy.In this paper,we use four real datasets.And through the average accuracy experiments and parameter sensitivity experiments on Esplit active learning method and EDCA algorithm,The validity and correctness of our algorithm are proved.
Keywords/Search Tags:data stream, active learning, concept drift, split sampling strategy, evidence, ensemble classifier
PDF Full Text Request
Related items