Font Size: a A A

Classification Algorithm Study And Application Based On Decision Tree

Posted on:2006-10-27Degree:MasterType:Thesis
Country:ChinaCandidate:Q Y ChiFull Text:PDF
GTID:2178360182997473Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
How to investigate current status and forecast the future with good useof tremendous original Data has been becoming the big challenge to humanbeings when facing the emergence of mass Data in information era.Consequently, Data mining technology emerge and boom quickly.Data mining means the process of adopting cryptic and potentialhelpful information from the Data base. It's one kind of brand new Dataanalysis technology and is popular used in the field of banking finance,insurance, government, education, transportation and etc. enterprises as wellas national defense scientific research. The universality of Data miningapplication and its great economic and social benefit attract the research ofthis field amongst many specialists and research institutes.Data classification is one of important contents from Data mining.There are many methods for Data classification, and amongst these theDecision Tree induction is widely used with its convenience for gettingapparent rules, smaller calculation workload, showing important decisioncharacteristics, higher classification correct rate and etc. advantages.Currently Decision Tree algorithm is one of most popular used Data miningalgorithm according to related statistics, and the utilization rate reaches19%. The application has been extending from medical to Game Theory,business and etc., and becoming the basis of some business rule inductionsystem.It's always the hot spot to seek for the method of constructing andsimplifying Decision Tree. SLIQ and SPRINT algorithm solved the problemvery well for that the stored Data on the disc is too big to be lodged by thememory. The algorithm doesn't apply to the method of getting small Data,lodged by memory, by means of sample or sorting Data base, but it utilizeone kind of new Data configuration to construct one Decision Tree directlyfrom the whole Data base. However, the size of training set for SPRINTand SLIQ algorithm is fixed, from stable environment, less affectedartificially, and normally ignores the trend of Data changing. In reality, theData base is not stable, but incoming continuously. It's very significative tooptimize the current algorithm, make them fit the growing exercitation baseand construct one tree related to old tree.The studies of this article emerge from the said background, and thepurpose is to research deeply the Data base knowledge discovery, probe intoDecision Tree updating during Data mining and apply to the reality tasks.Main researches see below:1. Point out that the key issue of constructing good Decision Tree ishow to select good logic judgments or characteristics. Compare theadvantage and disadvantage of Information Gain and Gini IndexCharacteristics Selective Measurement Methodology. Trigger the method ofconstructing Two Bifurcation Tree by calculating Gini Index with discretevalue combination.2. Study several creating and pruning algorithms of Decision Tree aswell as compare their advantages and disadvantages, and especially analyzeextensible SPRINT algorithms completely. Investigate the integration ofConstructing Tree algorithm and Pruning algorithm, i.e. Public algorithmand one RAINFOREST frame structure, which can be adopted to otheralgorithms, and used in the following examples hereafter.3. Introduce one Decision Tree Creating and Pruning algorithm of newData base dynamic changes by means of ideas of SPRINT algorithm, Publicalgorithm and RAINFOREST frame structure –cc sheet with the conditionof Data base high speed emerging in reality, to fit continuously growingData base and improve algorithm real time validity.4. The application of Decision Tree in the field of auto sale. Investigatethe questionnaire Data to find out characteristics of customers withpurchasing intent within one year, and this can be reference of figuring outmarketing strategy and launching advertisement, thus to strengthenmarketing competition of auto sale company and improve auto sales volume.
Keywords/Search Tags:Knowledge Discovery in Database (KDD), Data mining, Decision Tree, pruning tree
PDF Full Text Request
Related items