Font Size: a A A

The Research On Data Streaming Classification Hidden Concept Drift

Posted on:2009-03-28Degree:MasterType:Thesis
Country:ChinaCandidate:C Y HouFull Text:PDF
GTID:2178360272970947Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The streaming data existing in the Telegraphy, Network and other application fields has characteristics of high-speed, continuity, levity and openness. Some useful information exists in data streaming, so mining any unknown valuable model of regularity will exert a great influence on the Network Security and enterprise decision. The potential application of data stream mining technology is applied very widely in many fields such as government Management, commercial management decision-making, information security, etc. But the traditional algorithms of data mining are no longer suitable, because of the effect of the concept drift in data streaming. The research on the highly accurate and stable systems of data mining has become more valuable in both theorem and practice.The frequency of concept drift may be considered as the reappearance number of some of the concepts. In this dissertation, the characteristics of concept drift based on the frequency, and the effect of low frequency concept on data streaming classification, are explored. Two algorithms are proposed, one algorithm is to identify the frequency concept drift and the other (LFCR) is to reduce the low frequency concept with some mechanisms. The contents of the dissertation are as follows:(1) The background and development of data streaming mining, and some existing problems of correlation algorithms are discussed.(2) The traditional classification algorithms, and analyze problems of data streaming classification are discussed. The effect of concept drifts on classification of data streaming is discussed. Through analyzing the algorithms of classification with underlying concept drifts in it, the deficiencies are discussed.(3) The character of concept drift based on frequency to design a new algorithm to identify the frequency of concept drift is discussed. Therefore rule of the concept changing can be used to predict the next concept. It can improve the time performance of classification.(4) The effect of low frequency concept on time and space performance, andpropose a LFCR algorithm which reduces the low frequency concept withsome mechanisms are discussed. The experimental results shows the LFCRalgorithm has good time performance.
Keywords/Search Tags:data mining, data streaming, concept drift, classification, frequency
PDF Full Text Request
Related items