Research And Implementation Of Classification Algorithm For Positive And Unlabeled Examples Learning On Uncertain Data Stream

Posted on:2016-11-07

Degree:Master

Type:Thesis

Country:China

Candidate:S R Li

Full Text:PDF

GTID:2428330542454599

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Data stream is a typical example of the big data era.It has many characteristics,such as one-pass scanning,rapid variations,vastness and infinities.Uncertainty is an inherent property of data in real applications,which includes attribute level uncertainty and existential uncertainty.People are eager to acquire useful knowledge from uncertain data.As the foundation of data mining,classification technology is widely applied in various fields.However,traditional methods of classification tend to cost much to obtain the wholly-labeled samples and cannot solve the problem of knowledge varying with time in data stream effectively,which is also defined as concept drift.Positive and unlabeled examples learning(PU learning)can solve this without labelling the entire examples.This thesis studies the classification problem for positive and unlabeled examples learning on uncertain data stream.Firstly,the thesis introduces some problems in uncertain data stream and PU learning,relevant studies,background and significance of this study,and then summarizes the research status in China and abroad.Secondly,an uncertain data classification algorithm for positive and unlabeled examples learning is proposed.The algorithm is based on Weighted Extreme Learning Machine(ELM)which adopts dimension-reducing technology to handling uncertainty and clustering technology to extract reliable positive example and reliable negative example.It can not only deal with attribute level uncertainty and existential uncertainty simultaneously,but also adapt to PU learning environment.Finally,another classification algorithm is put forward for positive and unlabeled examples learning on uncertain data stream.The algorithm adopts the previously-proposed uncertain data classifier as base classifier and ensemble classification strategy.It uses cluster set similarities between the current data block and history data block to detect concept drift.If cluster set similarities exceed a certain threshold value,it is convinced that concept drift occurred.At this time ensemble classifier should be updated respectively,according to the category of concept drift.The experiment shows that the proposed algorithm in this thesis can classify unlabeled data in uncertain data stream,and it has a good performance in detecting and handling concept drift.

Keywords/Search Tags:

Uncertain Data Stream, PU Learning, ELM, Concept Drift, Ensemble Classifier

PDF Full Text Request

Related items

1	Research On Ensemble Classification Algorithms Of Data Stream Based On Concept Drift
2	Research On Concept Drift Detection And Ensemble Classifier Based On Data Stream
3	Research On Data Streams Classification With Concept Drift
4	Research And Implementation Of Uncertain Data Streams Classification Technique Based On Distributed Extreme Learning Machine
5	Research On Decomposition Strategy And Ensemble Classifier Adaptive Learning Of Data Stream
6	Classifier Ensemble For Data Stream Classification
7	Research On Classification Algorithm For Conceptual Drift Data Flow
8	Research On Concept Drift Data Stream Classification Based On Ensemble Learning
9	Research On Concept Drift Data Stream Classification Algorithm Based On Ensemble Learning
10	Research On Classification Algorithms For Imbalanced Data Stream With Concept Drift