Font Size: a A A

Research On Multi-label Data Streaming Classification Algorithm On Very Fast Decision Tree

Posted on:2023-11-24Degree:MasterType:Thesis
Country:ChinaCandidate:W W PanFull Text:PDF
GTID:2568306848467374Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Dynamic data streams are becoming increasingly common in the context of big data and data mining techniques are flourishing,with multi-label data stream mining attracting the attention of a growing number of researchers.However,with massive amounts of fast and continuous multi-label data being generated in areas such as sensors and social networks,this data flow scenario presents new challenges such as real-time,non-full volume storage,once-through samples,and concept drift.This dissertation is based on a multi-label dynamic data streaming scenario and new algorithms are designed to address the challenges faced.Few studies have utilized the cascade strategy to build incremental models to solve the online multi-label data stream classification problem.This topic uses the deep forest structure to perform representation learning layer by layer,and the cascade strategy is flexible and controllable,and does not rely on the characteristics of back propagation.Based on Hoeffding Tree,fast decision tree,Mondrian Forest and Adaptive Random Forest,proposed an incremental hierarchical framework MARDF based on cascade strategy is proposed,and on this basis,an incremental deep forest algorithm VDSDF based on fast decision tree is proposed.Both use a similar hierarchical structure retaining representational learning capabilities and both doing incremental learning,both the MARDF framework and the VDSDF algorithm are able to continuously receive samples arriving at high speed and predict them in real time.VDSDF uses the incremental fast decision tree forest designed in this dissertation as the base unit,and uses a different incremental learning approach from the MARDF framework to adapt to unstable data flow scenarios.experimental demonstration.The VDSDF algorithm proposed in this thesis is applicable to both stable multi-label data stream scenarios and unstable dynamic data stream scenarios.Experimental results show that VDSDF solves the challenges posed by online multi-label data stream mining to a certain extent.As an incremental learning algorithm,VDSDF scores higher than batch learning algorithms on several evaluation metrics and has advantages in model accuracy compared to the latest incremental learning algorithms.In addition,VDSDF outperforms comparative algorithms in adapting to concept drift in unstable data streams.In real scenario applications,the label prediction accuracy of VDSDF is also better than the comparison algorithm when faced with static multi-label film review data and on unstable dynamic data sources.
Keywords/Search Tags:multi-label classification, data streaming mining, incremental learning, deep forest, concept drift
PDF Full Text Request
Related items