Font Size: a A A

Classification Of Dynamic Data Stream With Noise

Posted on:2017-08-10Degree:MasterType:Thesis
Country:ChinaCandidate:Y LiuFull Text:PDF
GTID:2348330503968231Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of network communication and computer technology, data streams were generated in numerous fields, such as stock market data analysis, weather forecasting, satellite monitoring, Network traffic monitoring and other fields. The data streams have the following characteristics: rapid, continuous and dynamic, which cause traditional data mining techniques to be ineffective to deal with data streams. And dynamic data stream has the phenomenon of concept drift that concepts in dynamic data streams will vary with time. Therefore, how to effectively solve the concept drift of data stream has become a hot problem. Moreover, in a real environment, noise is everywhere. The existence of noise data can influence classification effect of the data stream. So how to effectively deal with dynamic data stream with noise is a problem that needs to be further studied.This paper is aimed at the problems of concept drift and noise problem in data stream classification process, the main research tasks are as follows:(1) Related technologies for data mining and data stream classification are reviewed, and the concept drift and noise of data stream are described and analyzed in detail.(2) Aiming at these problems for drift concept in the process of data stream classification, based on the hypothesis "an instance that does not conform to the current classification model may represent a changing trend of new concept", a classification algorithm of data stream for handing concept drift was proposed, called EWDSCA(Examples of weighted for data streams classification algorithm). The method used the thought of instance weighting to improve the influence of instances which may represent a new concept in the process of constructing the base classifier and make the classification model adapt to the new concept. At the same time, a dynamic weight adjusting factor was introduced to improve the algorithm adaptability. Studies have shown that, compared with weighted bagging, EWDSCA has higher efficiency and better classification results.(3) In the real world, most of data streams contain noisy data. The existence of the noise problem seriously influences classification effect of the algorithm. To solve this problem, a classification algorithm of dynamic data stream based on density clustering algorithm was designed, called FDNDCA(Fast-DBSCAN for noise data streams classification algorithm). The algorithm used fast clustering algorithm FDBSCAN to filter noise and built the weighted integration model using UFFT as base classifier with the thought of instance weighting, moreover, ? test method was introduced to detect concept drift. Studies have shown that, compared with the existing classification algorithm, FDNDCA algorithm has a better classification performance in dealing with dynamic data streams with noise.
Keywords/Search Tags:Data stream mining, Classification Techniques, Concept drifting, Noisy data
PDF Full Text Request
Related items