Font Size: a A A

Study And Improvement Of Mining Algorithm On Data Stream

Posted on:2016-10-26Degree:MasterType:Thesis
Country:ChinaCandidate:R QianFull Text:PDF
GTID:2308330473465481Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Data mining is facing new challenges in Big Data era. Knowledge needs to be accurately discovered and extracted from data streams generated by a growing number of raw data instantly. Due to magnitude issue and time efficiency, these raw data do not fit into the memories or external storages any more. Therefore, traditional data mining technique cannot be used to sum up data streams into diversity analyses. Widespread high-speed data stream model requires a single-pass mining algorithm which is capable of handling the fast transferring data in a limited memory with a real-time response.According to the project’s needs, this thesis analyses the background of data stream mining and classical algorithms for different mining models based on the analysis of similarities between traditional data mining and data stream mining, thereby studies a series of decision tree classification algorithm based on Hoeffding Trees, separately from the underlying application, measure method, processing of continuous-valued attributes and sampling size. Finally this thesis proposes an optimized algorithm VFDTCA for stationary data stream as an improvement of classical VFDT in terms of discretization of continuous-valued attributes. Meanwhile, Fayyad boundary point principle is introduced in computing the best split point of continuous-valued attributes for further optimization. Theoretical analysis and experiment results show that, when stream sample contains continuousvalued attributes, VFDTCA performs better on constructing decision tree and predicting category labels while using Gini index as measurement and applying Fayyad boundary point principle as improvement.
Keywords/Search Tags:Data Stream, Data Mining, Classification, Decision Tree, Hoeffding Bound
PDF Full Text Request
Related items