Font Size: a A A

Research On Partially-lazy Classification Algorithm Based On The Pattern Representation Of Data Stream

Posted on:2018-10-20Degree:MasterType:Thesis
Country:ChinaCandidate:J J JiangFull Text:PDF
GTID:2348330512493335Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
A data stream is a high-speed,theoretically-infinite,continuous sequence of data elements and the underlying distribution of data often changes with time.Duo to the continuous,unbounded,high speed and time-varying data distribution nature of data stream,the processing of data stream require single scan of the data,aptitude for changing data distribution,compact memory usage and limited processing time.Utilizing patterns that extracted from large scale data to build classification model is one of the important research problems.Exploiting patterns to establish the classification model is a kind of feasible approach.i.e.,pattern based Bayesian classifier which using a subset of patterns to evaluate probability approximations to overcome the conditional independence assumption made by Naive Bayes.However,most of the existed pattern-based Bayesian classifiers aim at static data set,which need multi-scan the whole database,require long processing time,large memory usage and can't adapt to the dynamic data stream environment.A pattern-based classification model,namely PBDS(Pattern-based Bayesian classifier for Data Stream),using partially lazy learning category over data stream is proposed.PBDS focuses on building and evaluating reliable probability approximations by exploiting frequent patterns that extracted from data stream tailored to a given test case.The main tasks include:(1)A single-pass algorithm,named FFI(Find Frequent Item algorithm on data stream),for online mining of frequent itemsets over data stream with sliding window model is proposed.To accelerate data processing and reduce the memory consumption,the simpler data structure,i.e.,hybrid trees structure is proposed for maintaining the set of items in the current window,and a pattern extracting mechanism is proposed to reduce the generation of candidate itemsets.(2)A pattern-based data stream Bayesian classifier,named PBDS using partially lazy learning strategy is proposed.PBDS build a condensed representation of data in form of itemset in the training stage.When classification request coming,PBDS builds the local classification model using the selected itemsets.The result of classification is less affected by the change of data's underlying distribution,since the model is known for well capturing data locality.To ensure the quick response of classification request,some work of building classification model is put into the training stage.(3)A new definition of frequent pattern based on the window model over data stream is proposed.In the case of incomplete extraction of the pattern in the data stream,the smoothing technique is used to handle the undetected items.A comparison with other algorithms on both real and synthetic data sets is conducted.The experimental results show that PBDS has better performance than other classifiers in classification accuracy and runtime.
Keywords/Search Tags:Data stream, Frequent pattern, Bayesian, Partially-lazy learning
PDF Full Text Request
Related items