Font Size: a A A

Research On Multi-label Data Stream Classification Method Based On Kernel Extreme Learning Machine

Posted on:2022-01-10Degree:MasterType:Thesis
Country:ChinaCandidate:H X ZhangFull Text:PDF
GTID:2518306560454984Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development and popularization of technology,huge volumes of streaming data have emerged in the fields of image recognition and text content.These data present the characteristics such as high-volume,high-speed,multi-label and high-dimension,and cause the concept drifting issue because the data distribution changes over the time.Thus,how to efficiently mine effective information from the multi-label streaming data has become an important issue for data stream classification.This dissertation aims to study the multi-label stream data classification with concept drift and streaming feature selection,main contributions are as follows.(1)Considering the concept drift issue in multi-label data streams,we propose a data stream ensemble classification method based on Kernel Extreme Learning Machine(KELM).Firstly,the method adopts the kernel extreme learning machine to train the base classifier on the sliding data chunk at the first k moments,and builds an ensemble model to adapt to potential data distribution changes.Secondly,Apriori correlation algorithm is used to obtain the label correlation of each data chunk,and the confidence of the co-occurrence label is introduced into the prediction process based on the ensemble model to improve the overall classification accuracy.Thirdly,the concept drift detection mechanism based on the MUENLForeset model is introduced.The detection mechanism uses the ensemble model to predict the classification result of the(k+1)th data chunk and detect whether concept drift occurs.When a concept drift is detected,the current data chunk information is used to retrain a single model,a weight function is set to reduce the proportion of the old data in the subsequent calculation process,and the ensemble model is updated according to the classification accuracy.Finally,experimental results show that as compared with the classic multi-label classification algorithm,the proposed method achieves better classification accuracies while adapting to the problem of concept drift in multi-label data streams.(2)Considering the issues of high-dimensional features and multiple types of concept drifts in multi-labeled data streams,a high-dimensional multi-label data stream ensemble classification method based on online sequence kernel extreme learning machine is proposed.Firstly,according to the sliding window mechanism,the data stream is divided into data chunks,the online sequential KELM is adopted to train a base classifier on the first k data chunks respectively for generating the ensemble model,and the corresponding ensemble model is used to classify and predict the(k+1)th data chunk.Secondly,the cosine similarity is used to calculate the label information in each data chunk to obtain the label similarity matrix,and the online streaming feature dimensionality reduction method based on K nearest neighbors and label correlation is proposed to reduce the impact in the classification accuracy from high-dimensional features.Thirdly,the cosine similarity is further introduced to detect whether concept drift occurs in features,labels,or both between the new and old data chunks.According to the concept drift detection situation,the base classifier is retrained on the new data chunk to update the ensemble model.Finally,experimental results show that the proposed method can maintain a good classification performance in a high-dimensional multi-label stream data environment and has a strong ability to adapt to the concept drift in features and labels.
Keywords/Search Tags:Multi-label data stream, Kernel extreme learning machine, Concept drift, Label correlation, String Feature selection
PDF Full Text Request
Related items