Research On Data Stream Classification Algorithm Of Adapting To The Concept-drift

Posted on:2011-04-07

Degree:Master

Type:Thesis

Country:China

Candidate:Z X Cao

Full Text:PDF

GTID:2178330332960338

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

In recent years, data mining has emerged a new field of research - data stream mining. In many practical applications, such as stock analysis, network fault detection, and credit card fraud, data mining has been widely used. In the field of data mining, classification mining is one of the important branches. Now there are some comprehensive classifical algorithms: VFDT- based on Hoeffding Tree, CVFDT- adapting to the concept drift, Ensemble Classifiers and VFDTC. Ensemble Classifiers is widely used in data stream classification mining. Concept drift-one of the characteristics of data streams, is the biggest challenge in classification mining. The performance of classification algorithm depends on its ability to adapt to the concept drift. Now, Ensemble Classifiers is the most superior performance of the algorithm.First the paper gives the theory of knowledge of data mining and show the algorithm EC4.5 and the concept of Concept-drifting.Comparing with EC4.5, although algorithm CEEPCE improves the accuracy, however,the ability of adapting to the concept-drifting is still inadequate. The classification algorithm that adapt to the concept drift is based on the ensemble classifiers'construction, eliminated, update, and the strengthening of differences for optimizing the classification performance. First, the paper describes the construction of the base classifier and the elimination criteria, combining the characteristics of eEP with to construct a higher degree of distinction between the base classifiers. Second, the paper finger out the standard of the elimination of the classifiers according to the error rate of classifiers. Moreover, according to the characteristics of algorithms CEEPCE, the paper prospers two improvements.The first method gives the method of strengthening of differences based on classification error to improve the accuracy of integrated classifiers. Under the premise of ensuring the performance of base classifier, extract the final set of classifiers. The second method improves the updating way .In order to choose whether to join the opposite classifier based on the comparison of the average error rate and random classification error rate, when updating the classifiers. Although wasting of time, when faced with large amounts of data classification, the superiority to adapt to the concept of change will be reflected.

Keywords/Search Tags:

Data mining, Data streams, Classification, Concept drift, Integrated classifier

PDF Full Text Request

Related items

1	Research On Classification Technologies In Mining Unsteady Data Streams
2	Study On Data Streams Online Classification Algorithm Of Adapting To The Concept-Drift
3	Classification Algorithm For Data Streams With Concept Drift And Its Applications
4	Research On Classification For Data Streams With Concept Drift
5	Research On Concept Drift And Noisy In Data Streams Classification
6	Research On Ensemble Classification Algorithms Of Data Stream Based On Concept Drift
7	Research On Classification Algorithm For Conceptual Drift Data Flow
8	Study On Data Streams Classification Algorithms Based On Ensemble Classifier
9	Research On Mining Algorithms Over Data Streams
10	Research On Data Streams Classification With Concept Drift