Font Size: a A A

Research On Data Stream Classification Algorithm Of Adapting To The Concept-drift

Posted on:2011-04-07Degree:MasterType:Thesis
Country:ChinaCandidate:Z X CaoFull Text:PDF
GTID:2178330332960338Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
In recent years, data mining has emerged a new field of research - data stream mining. In many practical applications, such as stock analysis, network fault detection, and credit card fraud, data mining has been widely used. In the field of data mining, classification mining is one of the important branches. Now there are some comprehensive classifical algorithms: VFDT- based on Hoeffding Tree, CVFDT- adapting to the concept drift, Ensemble Classifiers and VFDTC. Ensemble Classifiers is widely used in data stream classification mining. Concept drift-one of the characteristics of data streams, is the biggest challenge in classification mining. The performance of classification algorithm depends on its ability to adapt to the concept drift. Now, Ensemble Classifiers is the most superior performance of the algorithm.First the paper gives the theory of knowledge of data mining and show the algorithm EC4.5 and the concept of Concept-drifting.Comparing with EC4.5, although algorithm CEEPCE improves the accuracy, however,the ability of adapting to the concept-drifting is still inadequate. The classification algorithm that adapt to the concept drift is based on the ensemble classifiers'construction, eliminated, update, and the strengthening of differences for optimizing the classification performance. First, the paper describes the construction of the base classifier and the elimination criteria, combining the characteristics of eEP with to construct a higher degree of distinction between the base classifiers. Second, the paper finger out the standard of the elimination of the classifiers according to the error rate of classifiers. Moreover, according to the characteristics of algorithms CEEPCE, the paper prospers two improvements.The first method gives the method of strengthening of differences based on classification error to improve the accuracy of integrated classifiers. Under the premise of ensuring the performance of base classifier, extract the final set of classifiers. The second method improves the updating way .In order to choose whether to join the opposite classifier based on the comparison of the average error rate and random classification error rate, when updating the classifiers. Although wasting of time, when faced with large amounts of data classification, the superiority to adapt to the concept of change will be reflected.
Keywords/Search Tags:Data mining, Data streams, Classification, Concept drift, Integrated classifier
PDF Full Text Request
Related items