Font Size: a A A

Research On Concept Drift And Noisy In Data Streams Classification

Posted on:2013-01-14Degree:MasterType:Thesis
Country:ChinaCandidate:B J ChenFull Text:PDF
GTID:2248330371973771Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With rapid development of information technology and computer network, lots of datawas generated in numerous fields, such as stock exchange transactions, weather monitoring,network security, electronical commerce and so on. These data are often called data streamsbecause of appeared in streams form. Abundant knowledge was hidden and need to be minedurgently on these data streams. As one of primary branch of data mining research role, andclassification has important value on applications, so data streams classification technologyhas become one of research hot spots in data mining. However, these data streams have suchcharacteristics: rapid, continuous and non-repeatability, which causes traditional algorithmswere not applicable when deal with data streams. On the one hand, information or concepts indata streams will vary with time or environment, namely concept drifting. On the other hand,Noise data is inevitable in the real environment. The classification accuracy of classificationmodel affects by noise data. And how to discover and adapt to concept drift and noise datahandling effectively has become a big challenge in data streams mining.In this paper, aims at the problems concept drifting and noisy handle in the data streamsclassification mining, the main research are as follows:(1) Existing algorithms for data streams classification and their merits and demeritsarising in front of concept drifting and noise data are reviewed and analyzed.(2) Aim at problems on existing data streams with concept drifting classificationalgorithms, LDA(linear discriminant analysis)-based method, a classification algorithm calledIUDE(Incremental Updated Discriminant eigenspace) was proposed. The algorithm buildsthe model on data feature space by analyzing data feature space, and classification data byKNN(K-Nearest Neighbor) classification technology. This algorithm updated the data featurespace to deal with gradual concept drifting by ILDA(Incremental-LDA) and Detected abruptconcept drifting by Mean Square Error. Experimental results show that the algorithm cansolve gradual and abrupt two type concept drift on data streams classification.(3) The classification quality significantly decreased on existing data streams withconcept drifting classification algorithm when there is noise in data streams. A new algorithmFDBSCAN was proposed on handling noise data in data streams, which is improvedalgorithm of DBSCAN. A new data streams with concept drifting classification algorithmwas proposed base on this algorithm, which was called NDSC. Compared with the existedtypical data streams classification algorithms, the experimental results show that the effectiveness of the FDBSCAN algorithm on handling noise data.
Keywords/Search Tags:Data Mining, Data Streams, Classification Mining, Concept drifting, Noise data
PDF Full Text Request
Related items