Font Size: a A A

Multiple Selection Mechanism Based On The Concept Drifting Data Stream Mining Algorithms Research

Posted on:2011-12-06Degree:MasterType:Thesis
Country:ChinaCandidate:A L YeFull Text:PDF
GTID:2178360305472700Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In recent years, as the data mining technology matures, it has been widely applied to many sectors, especially banking, retailing, transportation, and the Internet industry, etc in which data mining technology has become a key pillar of Technology. The technological revolution brought by its development has attracted more and more attention from people.Researchers have carried out deep-level researches on data mining technology from various angles, which accelerate its development in the past 10 years, and further produce many data mining systems of different variety. This technology has also been integrated into many large databases.As data mining technology applications continue to expand, people find a lot of new data format, such as Web text mining, multimedia mining, image mining, data stream mining. With the widespread of the Internet, more and more data need to be processed on-line and without delay. Therefore, data stream mining techniques emerges. However, the data streams possess the characteristics of infiniteness, time variability, and high speed as well, which makes it more difficult to deal with than traditional static data mining. Algorithm for data flow requirements must be in a scan of the parallel, access to knowledge, Since the data stream is unlimited, it will cost a lot more if old data has to be picked up again for scanning, and data stream of O'clock variability often comes along with drift. Therefore, a single classification or clustering algorithm can not meet the requirements of precision; The high speed of data streams will be a big challenge for increasing the efficiency of our real-time algorithm.Nowadays there are two main methods applied to data streams mining researches in the world-classifying and clustering, among which the former one is more widely used. As for mining data stream classification, there are two main ideas involved:one is the integrated thinking, multiple classifier integration-based classifier ensemble form, with Genju classification on the training data set Yucedongtai adopt different the classifier; the other is the introduction of information gain decision tree algorithm, which mainly includes VFDT and CVFDT. VFDT algorithm is a great improvement for the decision tree in the field of data stream mining, which not only makes data stream mining algorithms more compact, but also helps online analysis of the data stream become more convenient. But this kind of algorithm does not take VFDT concept drift problem into consideration. Concept drift, as a difficult point in the field of data stream mining, is generated with the accompany of time-varying data streams. CVFDT is just the algorithm that has been improved on the basis of VFDT.This paper focuses on concept drift problem existing in the data stream classification mining, and also tries to improve CVFDT algorithm by putting forward a multiple choice decision tree algorithm CVFDT. The algorithm selection mechanism for multi-attribute to the node structure to overcome the concept of drift CVFDT not automatically detect defects, while avoiding duplication of tree traversal algorithm to improve classification accuracy and efficiency. Experimental results show that this algorithm has a better performance in the accuracy of classification than CVFDT algorithm performance as the samples increase in numbers.
Keywords/Search Tags:Data Streams Mining, Multiple-options, CVFDT, mCVFDT
PDF Full Text Request
Related items