Font Size: a A A

Research And Application Of Decision Tree Classification Method Based On Mcdiarmid's Inequality

Posted on:2020-03-13Degree:MasterType:Thesis
Country:ChinaCandidate:T JiaFull Text:PDF
GTID:2428330623457663Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the continuous development of information technology and big data,data streams models are widely used in various fields of social production and life.Therefore,the collection and analysis of data streams become critical.The explosive growth of data streams have led researchers to need more memory to store them.However,it is difficult to process data streams using traditional data mining techniques,and it is not possible to extract valuable information from a large number of data streams.Nowadays,researchers use the incremental decision tree methods to deal with data streams classification problems,which is one of the ways to mine useful information in a large number of data streams.First of all,this thesis summarizes the relevant knowledge of the data streams decision tree classification methods,including the definitions,concepts and characteristics of the data streams.Secondly,the existing decision tree classification methods are introduced,including single classification decision tree methods and ensemble classification decision tree methods.Then the data streams decision tree classification algorithm based on McDiarmid's inequality is studied.Finally,an urban user behavior analysis and verification platform based on decision tree classification method is designed.The main contributions of this thesis are as follows:(1)Firstly,introduce the basic knowledge of the concepts,characteristics and processing methods of data streams.Secondly,the classification methods used to process data streams at this stage,including decision tree,support vector machine,Bayes,neural network,KNN and association/classification rules are analyzed and compared.Next,the data streams decision tree classification methods are analyzed,including the single classification decision tree methods and the ensemble classification decision tree methods.Among them,the single classification decision tree methods include very fast decision tree algorithm,derivative algorithms of very fast decision tree,and other types of decision tree algorithms.The ensemble classification decision tree methods include ensemble classification methods based on Hoeffding's inequality,ensemble classification methodsderived from random decision trees,and other types of ensemble classification methods.(2)The problems that the data streams time is too long for the Hoeffding's inequality,and there are insufficient problems in the attribute splitting metrics,such as the information gain and the Gini index,cann't be expressed as the sum of the real-valued randomvariable.Among them,1 ? ?,represents the number of attributes is areal-valued random variable with a certain distribution.In order to further improve the classification performance,this thesis proposes an algorithm of data streams decision tree which is called McDDT(McDiarmid Decision Tree)based on McDiarmid's inequality.It also researches and uses t for attribute classification metrics.Compared with theclassical decision tree algorithms,it has significantly reduced the running time when the classification accuracy is increased or almost unchanged.In addition,the number of nodes and the number of layers in the decision tree are significantly reduced.(3)This thesis designs a user visit behavior analysis and verification platform based on McDDT algorithm.The platform is designed based on Tkinter framework in Python language.It mainly provides users with core functions such as data processing,data analysis and result display,which are used to realize the predictive analysis function of the administrative area that the user has visited.
Keywords/Search Tags:data streams, classification, decision tree, Hoeffding's inequality, McDiarmid's inequality
PDF Full Text Request
Related items