Font Size: a A A

Research And Implementation Of Classification Algorithm For Positive And Unlabeled Examples Learning On Uncertain Data Stream

Posted on:2016-11-07Degree:MasterType:Thesis
Country:ChinaCandidate:S R LiFull Text:PDF
GTID:2428330542454599Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Data stream is a typical example of the big data era.It has many characteristics,such as one-pass scanning,rapid variations,vastness and infinities.Uncertainty is an inherent property of data in real applications,which includes attribute level uncertainty and existential uncertainty.People are eager to acquire useful knowledge from uncertain data.As the foundation of data mining,classification technology is widely applied in various fields.However,traditional methods of classification tend to cost much to obtain the wholly-labeled samples and cannot solve the problem of knowledge varying with time in data stream effectively,which is also defined as concept drift.Positive and unlabeled examples learning(PU learning)can solve this without labelling the entire examples.This thesis studies the classification problem for positive and unlabeled examples learning on uncertain data stream.Firstly,the thesis introduces some problems in uncertain data stream and PU learning,relevant studies,background and significance of this study,and then summarizes the research status in China and abroad.Secondly,an uncertain data classification algorithm for positive and unlabeled examples learning is proposed.The algorithm is based on Weighted Extreme Learning Machine(ELM)which adopts dimension-reducing technology to handling uncertainty and clustering technology to extract reliable positive example and reliable negative example.It can not only deal with attribute level uncertainty and existential uncertainty simultaneously,but also adapt to PU learning environment.Finally,another classification algorithm is put forward for positive and unlabeled examples learning on uncertain data stream.The algorithm adopts the previously-proposed uncertain data classifier as base classifier and ensemble classification strategy.It uses cluster set similarities between the current data block and history data block to detect concept drift.If cluster set similarities exceed a certain threshold value,it is convinced that concept drift occurred.At this time ensemble classifier should be updated respectively,according to the category of concept drift.The experiment shows that the proposed algorithm in this thesis can classify unlabeled data in uncertain data stream,and it has a good performance in detecting and handling concept drift.
Keywords/Search Tags:Uncertain Data Stream, PU Learning, ELM, Concept Drift, Ensemble Classifier
PDF Full Text Request
Related items