Research On Multi-label Data Stream Semi-supervised Integrated Classification Method Based On Cooperative Training

Posted on:2021-05-21

Degree:Master

Type:Thesis

Country:China

Candidate:Z Chu

Full Text:PDF

GTID:2428330614460369

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the rapid development and popularization of network technology,massive and fast data streams have been generated in areas such as network traffic monitoring and credit card fraud detection.I Except of the high-volume and high-speed characteristics,data streams present multiple labels,a large amount of missing labeled data new class emerging and concept drifts over the time flowing.Thus,how to mine a large number of potential value information becomes an important task for multi-label data stream classification.This dissertation aims to carry out research on classification methods for the problems of missing labels,new class emergence,and concept drift in multi-label data streams.Our contributions are as follows.1)Considering the issues of missing labels and new class emergence of multi-labeled data streams,this dissertation proposes a semi-supervised multi-labeled data stream classification method based on Co-training.First,the algorithm uses a sliding window mechanism to divide the data stream into chunks,and uses the multi-label semi-supervised classification algorithm COINS to train the base classifier on the first w chunks of data to build an integrated model to adapt to labeling a large number of missing data in the streaming environment.At the same time,a new class detection mechanism is introduced,and the integrated model is used to predict the(w+ 1)th data chunk to detect whether there is a problem with the new label.When a new label is detected,the classification is retrained on the current data chunk to update the integration model.Experiments show that compared with the classic algorithm,the proposed algorithm can improve the accuracy of multi-label data stream classification in the environment of large number of missing class labels and new class emergence.Considering the issues of missing labels and concept drifts in multi-labeled data streams,a multi-labeled data stream classification method based on Tri-training and KL divergence is proposed in this dissertation.In terms of Tri-training mechanism,an ensemble model based on the online sequential Extreme Machine Learning(OS-ELM)classifier is firstly built.Secondly,in the analysis of the changing reasons of data distributions,the KL divergence is introduced to monitor changes in feature space and label space to detect virtual and real concept drifts.The ensemble model is updated by the corresponding concept drifting detection.Finally,experiments show that the proposed method can effectively detect the virtual and real concept driftshidden in multi-label data streams,while improving the classification accuracy of the classification model.

Keywords/Search Tags:

multi-label data stream, semi-supervised, new class emerging, concept drift

PDF Full Text Request

Related items

1	Research On Semi-supervised Classification Algorithm For Data Stream With Concept Drift
2	Learning On Evolving Data Streams
3	Research On Semi-supervised Data Stream Classification Method Based On Ensemble Model
4	Research On Class Incremental Learning And Concept Drift Detection In Multi-label Data Streams Classification
5	Research On Ensemble Classification Algorithms Of Data Stream Based On Concept Drift
6	Concept Drift Detection Algorithm Based On Multi-label Learning With Label Special Features
7	Research On Classification Of Data Stream With Recurring Concept Drift
8	Research On Concept Drift Detection In Data Stream And Classification Algorithms For Imbalanced Data Stream
9	Research On Classification Algorithms For Imbalanced Data Stream With Concept Drift
10	Research On Semi-supervised Classification Of Data Stream Based On Adaptive Density Clustering