Research On Data Stream Classification Algorithm With Limited Amount Of Labeled Data

Posted on:2015-10-03

Degree:Master

Type:Thesis

Country:China

Candidate:Z T Ren

Full Text:PDF

GTID:2298330422983374

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the rising popularity and rapid development of the network technology,massive, real-time continuous and dynamic-changing data have produced in manyemerging areas, such as real-time monitoring system, communication network,wireless sensor network and so on. These data are called data stream in the academicfield. For this kind of data, analyzing and researching with traditional mining methodis obviously inappropriate. We need to research new method of stream mining, andthe research has got the extensive attention of scholars. Data stream classification is avery important technology in the field of data stream mining. First of all, we need tolearn a large number of labeled data. After that, we extract knowledge from these dataand use them to forecast the unknown data. Because with the changing of time andenvironment，the concept and knowledge of data may constantly change, which aredefined as concept drifting．An efficient classification algorithm of data stream needscomplete the classification task with good accuracy under the limited time andmemory, and can adapt to concept drifting.Existing data stream classification algorithms are mostly supervised learningmethod, which have to use labeled data for training. However, labeling data need tospend a lot of time and resources. Clustering algorithms may have no such distress,but they do not consider the small part labeled data, which reduces the accuracy ofalgorithm. In view of the above problems, this paper select “data stream classificationwith limited labeled data” this vision to research and propose two data streamclassification algorithms SKAOGClass and SMEClass. SKAOGClass algorithm is akind of incremental classification algorithm, which chooses K-associated graph torepresent the topology structure and similarity between samples, then distinguishescategories. We use K-associated optimal graph as the basic model and designsemi-supervised constructing method for the K-associated optimal graph, then use thesemi-supervised K-associated optimal graph to construct basic nonparametricclassifier. If a new sample arrives, we convert it to a vertex and connect it to the maingraph, then estimate the label of new sample with the Bayesian theory. SMEClassalgorithm combines with their advantages of horizontal ensemble and vertical ensemble, select decision tree classifier and Bayesian classifier to construct anensemble model. If the new data block arrives, we construct a decision tree classifierwith those labeled data, then use the new classifier and the existing ensembleclassifier to label those unlabeled sample with voting method and update the ensembleclassifier. The Bayesian classifier is used to monitor the labeling process in ensemblemodel, which can filter the noise data. The experimental results show thatSKAOGClass and SMEClass algorithm all have high accuracy and strong reliability.In addition to this，they can apply to the data stream with limited labeled data.

Keywords/Search Tags:

data stream, classification, semi-supervised learning, K-associated graph, ensemble classification

PDF Full Text Request

Related items

1	Research On Semi-supervised Classification Of Data Stream Based On Adaptive Density Clustering
2	Research On Semi-supervised Data Stream Classification Method Based On Ensemble Model
3	Research On Semi-supervised Classification Algorithm For Data Stream With Concept Drift
4	Ensemble Based Semi-supervised Learning For Fault Classification
5	Research Of Reliable Semi-supervised Classification
6	Research On Image Classification Algorithm Based On Semi-supervised Learning
7	Selection And Classification Of Unbalanced Data Based On Semi - Supervised And Integrated Learning
8	Semi-supervised Ensemble Learning For Hyperspectral Image Classification
9	Research On Semi-supervised Classification Of Data Stream Based On Clustering
10	Research On Semi-supervised Classification Algorithm Based On Clustering Ensemble