Font Size: a A A

Concept Drift Detection Algorithm Based On Multi-label Learning With Label Special Features

Posted on:2021-04-21Degree:MasterType:Thesis
Country:ChinaCandidate:H K LiuFull Text:PDF
GTID:2428330626965143Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years,with the vigorous development of network technology,a large number of data are generated in the form of data streams.More and more scholars pay attention to the study of data flow.At the same time,under the traditional machine learning classification framework,each instance will be assigned a separate label.However,data in the real world are often assigned to multiple different categories,all of which form a tag set of an instance.If one of the tags is omitted,the information of the instance will be incomplete.In order to deal with the ambiguity of real data,multi-label learning task arises at the historic moment.However,in practical applications,the computational complexity of multi-label learning tasks increases and the classification performance decreases due to a large number of redundant features in the real data environment.An effective solution is to extract features from multi-label data to eliminate redundant features.Among them,multi-label learning algorithms based on label special features complete the task of label feature selection and classification by extracting the correlation between labels.However,these algorithms lack attention to the correlation between instances.In addition,in the real world,a large amount of data is generated every moment,most of which exist in the form of data streams.More and more attention has been paid to the research of multi-label data stream.Around the above problems,the following work is carried out in this paper:1.In view of the lack of existing multi-label learning algorithms and the lack of consideration of case correlation,a classification algorithm is proposed to learn the unique characteristics of labels and case correlation.When constructing the model,not only the correlation of labels but also the correlation of case characteristics is considered.In this paper,the similarity map is constructed to learn the similarity of instance feature space,and the instance similarity information is added to the model training.The experimental results show that the algorithm proposed in this paper can extract the unique features of tags more effectively and has better classification performance.2.To solve the problem that the existing concept drift detection methods are mostly focused on single label data stream,which is difficult to meet the concept drift detection of multi-label data stream,this paper proposes a hierarchical check concept drift detection algorithm for multi-label data stream.The proposed algorithm includes a checking layer and a checking layer.The checking layer judges whether concept drift occurs by detecting the change of data distribution,and the checking layer judges whether concept drift really occurs by judging the change degree of label confusion matrix.Experiments are carried out on 14 data sets such as real multi-label data sets and synthetic multi-label data sets.Compared with the existing methods,the hierarchical verification algorithm proposed in this paper performs better under the indexes of Subset accuracy,Jaccard similarity and F-measure.Experimental results show that the proposed algorithm can effectively detect concept drift and improve classification performance.
Keywords/Search Tags:Multi-label, Label special features, Concept drift, Data stream
PDF Full Text Request
Related items