Font Size: a A A

Research On Ensemble Classifier Of Datastream Based On UFFT

Posted on:2011-04-29Degree:MasterType:Thesis
Country:ChinaCandidate:T T ZhenFull Text:PDF
GTID:2178360308972941Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The scalability of data mining methods is constantly being challenged by real-time production systems that generate tremendous amount of data at unprecedented rates. Examples of such data streams include network event logs, telephone call records, credit card transactional flows, sensoring streams,etc. the underlying data generating mechanism, or the knowledge need us to learn from those data. The overwhelming volume of the streaming data is one of challenges that knowledge discovery tools are facing.Classification for data streams has become one of research hot spots in data mining. The classification algorithms are categoried into two classes: single classifier and ensemble classifier.Classifier ensembles are aggregations of several classifiers whose individual predictions are combined in some manner to form a final prediction. Conventional ensemble classifiers have the accuracy in performing classification and the efficiency in learning the model. However, based-modles are established by non-incre- mental algorithm, It arises time and space consuming. Based this problem,we propose a new method to optimize ensemble classifier.There are main contribution of this thesis as follows:(1) Existing efforts on data stream research are review,which including the statistical and computational approaches and algorithms. We introduce the data steam classification algorithm,described the ensemble classifier technology.(2) the inherent weakness of weighted-bagging model is time and space consuming in training data ,It is not enough to adapt to real-time data streams.a new datastream mining method called UFFT_wb is proposed to solve this problem, which is based on the weighted-bagging model and uses the UFFT algorithm to build the base classifier. Experiment results show that,UFFT_wb has its own characteristics,such as,the less time to choose the cut point for splitting tests,the little space to build new node, the incremental construct and so on. While maintaining the similar accuracy,this method is superior in the time consumption compared with the weighted-baging algorithm which is based on C4.5.(3) Based on the research above, an experimental system(UFFT_wb) for classifying data streams has carried out.
Keywords/Search Tags:Data Streams, Ensemble Learning, UFFT, weighted-bagging
PDF Full Text Request
Related items