Font Size: a A A

Research On Dynamic Data Stream Classification Based On Bayesian Network

Posted on:2020-09-27Degree:MasterType:Thesis
Country:ChinaCandidate:H M FanFull Text:PDF
GTID:2428330596979677Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the coming of the big data era,online data has increased dramatically.Mining of massive data streams in real-time has become a major challenge in the field of machine learning.The online learning method processes large scale data in real-time by updating model incrementally and processing the data item by item,which has received extensive attention from researchers.As an online learning method,Naive Bayes is simple,efficient,and has a solid theoretical foundation.It is used to solve the problem of data stream classification.However,when the concept drift occurs in the data stream,it will seriously affect its classification performance.At the same time,its assumptions of independence for attribute are usually not met in real-world applications.Based on the above problems,this paper makes improvement research on the basis of Naive Bayesian algorithm:(1)In order to solve the problem that the dimension of feature space is too high in classification,and the assumption of independence for attribute in Naive Bayesian algorithm is insufficient.This paper proposes an information theory-based classification framework for attribute selection.By analyzing the correlation properties between Jeffreys divergence and type? and type ?.errors of bayesian classifier,aiming at the limitations of Jeffreys divergence under multivariate distribution,the multi-Jeffreys-Hypoth-sis(MJH)was introduced to measure the multivariate distribution differences,and a selective Naive Bayesian classification algorithm based on MJH was proposed.Experimental results show that the algorithm has good classification effect and convergence.(2)Naive Bayesian classifier has no mechanism to detect and handle concept drift,and cannot handle streaming data classification under non-stationary conditions.This paper proposes a weighted naive Bayesian algorithm based on forgetting mechanism.The weighting of the instance is carried out by the forgetting mechanism,and the weight is gradually attenuated over time,so that the original naive Bayes classifier can automatically and quickly adapt to the data change,and achieve the purpose of solving the concept drift problem.Experimental results show the effectiveness of the algorithm.(3)In the presence of concept drift,based on the assumption that historical knowledge and current knowledge are related,analyse the advantages of integrated learning method,this paper proposes an integrated learning algorithm based on knowledge transfer.Through the pattern of knowledge transfer,while extracting the useful knowledge in the historical model,the knowledge that is different from the latest data distribution is removed,so a new historical model is obtained.Weighted and merged the migrated historical model with the latest data derived model.The experimental results on simulation and real data show that the integrated learning algorithm based on knowledge migration can fully utilize the advantages of integrated learning and effectively solve the problem of concept drift in data stream classification.
Keywords/Search Tags:Bayesian Network, Attribute Selection, Concept Drift, Forgetting Mechanism, Knowledge Transfer
PDF Full Text Request
Related items