Font Size: a A A

Based On Region-Factor Distributed Decision Tree Algorithm And Its Grid Model

Posted on:2009-03-03Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y KongFull Text:PDF
GTID:2178360245486076Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
With the development of information technique and the progress of economic globlization,a great number of chain business enterprises rising. A great deal of distributed chain store,logistic warehouse and registered office connect with each other through network,that forms distributed environment of data.In this kind of case,the data in the business database is following several important characteristics:(1) There is much new data updating everyday in the database,and the quantity will be more and more;(2) The data distributes in distributed chain stores;(3) The data stored in different place has different characteristics,and they have the worth to analyse;(4) Storage format of data in different places is different, it makes analysis among them is difficult.This paper mainly researched various traditional decision tree algorithms.These include centralized decision tree algorithms(like ID3, C4.5 etc.) and distributed decision tree algorithms(like SPRINT,SLIQ etc.).All these type of decision trees are not aim at chain business enterprise,and the application for them in chain business enterprise will not be available.That makes them hard to satisfy the demand of data mining which will be more and more complex.This paper researched the relationship among those distributed database which belong to a chain enterprise.This paper gives a point that the people in different region will have differnt consuming custom because of different life habits,economy level and population factor. Therefore,the data,saving for customers' action,in different region would show different characteristics.Then,this paper shows a distributed decision tree algorithm called ZDT based on region-factor.It aims at the regionally characteristics of chain business enterprise,and joins a region-factor in J4.8 algorithm,and chooses this attribute as first splitting standard,then chooses information gain ratio as splitting standard.This paper introduces the concepts of region factor and characteristic-difference of decision tree,and gives the algorithm of characteristic-difference of decision tree.Passing region-branch algorithm to acquire the region-branch head of decision tree,passing the algorithm of the ratio of characteristic-difference to get the otherness of different decision tree belong to different region.Finally,to merger the decision trees which have ratio of characteristic-difference over a specified threshold to avoid born decision tree from here too huge.In this paper,we realize the ZDT algorithm with grid technology, and call it grid-based ZDM system,namely GZDM system.GZDM achieves distributed computing ZDT algorithm with a variety of open-source tools.Finally,experimental GZDM and ZDT algorithm is feasible.After experimental testing and analysis,can know GZDM and ZDT algorithm model is practicable a certain extent.It provides the theory and practice of operational mode for the distributed commercial data mining applications of the chain of commercial enterprises.
Keywords/Search Tags:region-factor, decision tree, grid, weak, globus toolkit
PDF Full Text Request
Related items