Font Size: a A A

Research On Classification For Data Streams With Concept Drift

Posted on:2010-09-03Degree:MasterType:Thesis
Country:ChinaCandidate:C X PanFull Text:PDF
GTID:2178360275977783Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With rapid development of information technology and computer network, lots of data streams were generated in numerous fields, such as network security, stock exchange transactions, electronical commerce and weather monitoring etc, where abundant knowledge was hidden and needed to be mined urgently. As one of primary embranchments in knowledge discovery, classification has played a important role in many applications, and classification for data streams has become one of research hot spots in data mining. However, due to the infinity and speediness of the continuously-arriving data, traditional algorithms were not applicable when dealing with data streams; information or concepts in streaming data will vary with time or environment, namely concept drift, and how to discover and adapt to concept drift effectively has become a big challenge in data mining.The work of this dissertation aims at the problems mentioned above, and the main contributions are as follows:(1) Existing algorithms for data streams classification and their merits and demerits arising in front of concept drift are reviewed and analyzed.(2) A Example-Weighted algorithm for mining data streams (EWAMDS) is proposed for data streams classification in the presence of concept drift, in which weight of training examples is adjusted according to base classifier's prediction, so as to make new constructed classifier converge much quickly; and a dynamic weight modifying factor is introduced to improve its robustness to noise. The results of experiments show validity of this mechanism; and in comparison with weighted bagging, EWAMDS has a lower time consumption and higher accuracy.(3) In order to minish small-weighted old base classifier's negative influence on adaptability of the ensemble classifier when facing abrupt concept drift, a mean square error based concept drift detection model is proposed, based on which MSEBDM algorithm is devised, and all classifiers are discarded when concept drift is detected. The results of experiments show its validity.(4) Based on the research above, an experimental system(EWAMDS) for classifying data streams has carried out, and the algorithms mentioned are validated experimentally.
Keywords/Search Tags:Data Streams, Classification, Concept drift, Ensemble Learning
PDF Full Text Request
Related items