Research On Semi-supervised Data Stream Classification Method Based On Ensemble Model

Posted on:2022-09-07

Degree:Master

Type:Thesis

Country:China

Candidate:X L Zheng

Full Text:PDF

GTID:2518306560455654

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

With the development of the Internet and big data technology,more and more real-world applications in our daily lives,such as news retrieval,Taobao shopping and bank transactions,are generating massive amounts of streaming data.Contrary to the static data used in traditional data mining tasks,these data streams are possessed of many new characteristics such as high-volume,high-speed,multiple labels,hidden concept drift,and concept evolution.Meanwhile,they possibly present the multi-label characteristic and aggravate the issues of label imbalance and label noise,which makes the classification of data streams face unprecedented challenges.How to efficiently and accurately mine the potentially valuable information in the data stream has become an important task of data stream classification.This dissertation aims to take advantage of the semi-supervised classification model to carry out our classification method on a series of problems such as the lack of label information in data streams.The main contributions are as follows.(1)To handle with the problem of insufficient data label information and concept evolution in actual data streams,a semi-supervised classification algorithm for single-labeled data streams is proposed in this dissertation.This method uses a small amount of labeled data to construct a semi-supervised classification model.Meanwhile,in order to detect the occurrence of concept evolution,this method uses the properties of category clusters,clustering within clusters and sparseness between clusters to confirm whether an instance is a novel class instance.In addition,considering the hidden recurring concept drift,the method first uses detection mechanism to track the significant changes in the confidence score window,and then calculates the distance of the distribution before and after the drift to confirm the recurring concept drift.A large number of experiments show that: as compared with the classic data stream classification methods,the proposed method not only presents a higher classification accuracy,but also can effectively detect recurring concept drift and concept evolution hidden in single label data streams.(2)To deal with the issues of concept drift,class label imbalance and label noise aggravated in multi-label data streams,a semi-supervised classification algorithm is proposed for multi-label data stream.More specifically,this method uses a small amount of labeled data to construct a classification model.To adapt to multiple types of concept drifts(namely heterogeneous concept drifts)in multi-label data stream,this method adopts a self-adjustment sliding window mechanism to adapt to heterogeneous concept drift.To handle the label noise and class imbalance in multi-label data stream,this method adopts an error punishment mechanism to delete the data polluted by the label noise and the data that causes the class imbalance from the window as soon as possible.A large number of experiments show that as compared with the classic multi-label classification methods and the multi-label data stream classification methods,the proposed method can adapt well to the issues of heterogeneous concept drift,label noise and class imbalance,while it could maintain a better classification accuracy under various data conditions.

Keywords/Search Tags:

Data stream classification, Concept drift, Concept evolution, Semi-supervised classification

PDF Full Text Request

Related items

1	Research On Semi-supervised Classification Algorithm For Data Stream With Concept Drift
2	Research On Classification Of Data Stream With Recurring Concept Drift
3	Research On Semi-supervised Classification Of Data Stream Based On Adaptive Density Clustering
4	Classification Algorithm For Data Streams With Concept Drift And Its Applications
5	Research On Classification Algorithm For Conceptual Drift Data Flow
6	Research On Ensemble Classification Algorithms Of Data Stream Based On Concept Drift
7	Research On The Classification Of Data Stream With Concept Drift Based On Cosine Similarity
8	Research On Multi-label Data Stream Semi-supervised Integrated Classification Method Based On Cooperative Training
9	Learning On Evolving Data Streams
10	Research And Distributed Implementation Of Stream Classification Based On Concept Drift