Font Size: a A A

Research On Multi-label Data Stream Semi-supervised Integrated Classification Method Based On Cooperative Training

Posted on:2021-05-21Degree:MasterType:Thesis
Country:ChinaCandidate:Z ChuFull Text:PDF
GTID:2428330614460369Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development and popularization of network technology,massive and fast data streams have been generated in areas such as network traffic monitoring and credit card fraud detection.I Except of the high-volume and high-speed characteristics,data streams present multiple labels,a large amount of missing labeled data new class emerging and concept drifts over the time flowing.Thus,how to mine a large number of potential value information becomes an important task for multi-label data stream classification.This dissertation aims to carry out research on classification methods for the problems of missing labels,new class emergence,and concept drift in multi-label data streams.Our contributions are as follows.1)Considering the issues of missing labels and new class emergence of multi-labeled data streams,this dissertation proposes a semi-supervised multi-labeled data stream classification method based on Co-training.First,the algorithm uses a sliding window mechanism to divide the data stream into chunks,and uses the multi-label semi-supervised classification algorithm COINS to train the base classifier on the first w chunks of data to build an integrated model to adapt to labeling a large number of missing data in the streaming environment.At the same time,a new class detection mechanism is introduced,and the integrated model is used to predict the(w+ 1)th data chunk to detect whether there is a problem with the new label.When a new label is detected,the classification is retrained on the current data chunk to update the integration model.Experiments show that compared with the classic algorithm,the proposed algorithm can improve the accuracy of multi-label data stream classification in the environment of large number of missing class labels and new class emergence.Considering the issues of missing labels and concept drifts in multi-labeled data streams,a multi-labeled data stream classification method based on Tri-training and KL divergence is proposed in this dissertation.In terms of Tri-training mechanism,an ensemble model based on the online sequential Extreme Machine Learning(OS-ELM)classifier is firstly built.Secondly,in the analysis of the changing reasons of data distributions,the KL divergence is introduced to monitor changes in feature space and label space to detect virtual and real concept drifts.The ensemble model is updated by the corresponding concept drifting detection.Finally,experiments show that the proposed method can effectively detect the virtual and real concept driftshidden in multi-label data streams,while improving the classification accuracy of the classification model.
Keywords/Search Tags:multi-label data stream, semi-supervised, new class emerging, concept drift
PDF Full Text Request
Related items