Font Size: a A A

Research On Data Stream Classification Algorithm With Limited Amount Of Labeled Data

Posted on:2015-10-03Degree:MasterType:Thesis
Country:ChinaCandidate:Z T RenFull Text:PDF
GTID:2298330422983374Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rising popularity and rapid development of the network technology,massive, real-time continuous and dynamic-changing data have produced in manyemerging areas, such as real-time monitoring system, communication network,wireless sensor network and so on. These data are called data stream in the academicfield. For this kind of data, analyzing and researching with traditional mining methodis obviously inappropriate. We need to research new method of stream mining, andthe research has got the extensive attention of scholars. Data stream classification is avery important technology in the field of data stream mining. First of all, we need tolearn a large number of labeled data. After that, we extract knowledge from these dataand use them to forecast the unknown data. Because with the changing of time andenvironment,the concept and knowledge of data may constantly change, which aredefined as concept drifting.An efficient classification algorithm of data stream needscomplete the classification task with good accuracy under the limited time andmemory, and can adapt to concept drifting.Existing data stream classification algorithms are mostly supervised learningmethod, which have to use labeled data for training. However, labeling data need tospend a lot of time and resources. Clustering algorithms may have no such distress,but they do not consider the small part labeled data, which reduces the accuracy ofalgorithm. In view of the above problems, this paper select “data stream classificationwith limited labeled data” this vision to research and propose two data streamclassification algorithms SKAOGClass and SMEClass. SKAOGClass algorithm is akind of incremental classification algorithm, which chooses K-associated graph torepresent the topology structure and similarity between samples, then distinguishescategories. We use K-associated optimal graph as the basic model and designsemi-supervised constructing method for the K-associated optimal graph, then use thesemi-supervised K-associated optimal graph to construct basic nonparametricclassifier. If a new sample arrives, we convert it to a vertex and connect it to the maingraph, then estimate the label of new sample with the Bayesian theory. SMEClassalgorithm combines with their advantages of horizontal ensemble and vertical ensemble, select decision tree classifier and Bayesian classifier to construct anensemble model. If the new data block arrives, we construct a decision tree classifierwith those labeled data, then use the new classifier and the existing ensembleclassifier to label those unlabeled sample with voting method and update the ensembleclassifier. The Bayesian classifier is used to monitor the labeling process in ensemblemodel, which can filter the noise data. The experimental results show thatSKAOGClass and SMEClass algorithm all have high accuracy and strong reliability.In addition to this,they can apply to the data stream with limited labeled data.
Keywords/Search Tags:data stream, classification, semi-supervised learning, K-associated graph, ensemble classification
PDF Full Text Request
Related items