Font Size: a A A

Research And Application On Decision Tree In Data Mining

Posted on:2009-08-12Degree:MasterType:Thesis
Country:ChinaCandidate:Y XiaFull Text:PDF
GTID:2178360272476349Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Data mining, also called as KDD(Knowledge Discover in Database),is an advanced process, in which we can pick up many trustful, novel, useful and readable patterns from very large amounts of data. Classification is an important problem in data mining. Classification now has been successfully applied to wide range of application areas. Many different techniques have been proposed for classification, decision tree classifiers have found the widest applicability in large_scale data mining environments. The decision tree's formulation process is according to the inspiring rule. Decision tree classifier as one type of classifier is a flow-chart-like tree structure, where each internal node denotes a test on an attribute (attribute value), each branch represents an outcome of the test, and each leaf node represents a class. The method that a decision tree model is used to classify a record is to find a path that from root to leaf by measuring the attributes test, and the attribute on the leaf indicates to which the record belongs. The ID3 and C4.5 algorithms based on the information theory and the CART, SLIQ and PUBLIC methods based on the lowest GINI index are very common in the decision tree's building.This paper based on actual development of epidemic prevention and child immunity management system, begin with the practical application, several algorithms of decision tree classification have been investigated, advanced their insufficiency and improved the ID3 algorithm then implements the decision support module, the whole system also implements an interface with the countrywide monitor of child inoculate information management system.ID3 algorithm of this issue for more value tend to attribute the shortcomings of the test, consider an algorithm for the optimization of the parameters of the introduction of a P bound to choose Properties, in the decision tree learning, in addition to adding a decision-making used to create and modify Examples of trees outside the set, all rules affect the decision tree generation and the selection, testing attributes make reasonable choices tend to avoid large data small to cover up the data, decision tree so that the reduced value of more properties -Dependent, to resolve the ID3 algorithm tends to attribute more value bias problems.At present, the market for children's immune systems are mostly based on C / S structure, the use of distributed data storage, children's vaccination card carrying the way, data can not be shared, it can not resolve the different vaccination, the inoculation rate of automatic statistics, to provide intelligent decision-making analysis Such as the management of the urgent need to address the major problem 6. With the rise of Internet technology, application software systems architecture from the beginning of C / S (Client / Server) to the structure of the B / S (browser / server) the structure of the transition, B / S is the structure of the C / S structure of a Kind of improvement or change. In this structure, the user interface completely through the web browser to realize that part of the logic of affairs in the front-end to achieve, but the main server-side logic in the realization of the formation of the so-called 3-tier structure. B/S structure of the mature and take advantage of the popularity of the browser technology that require sophisticated software to achieve the power and save development costs, a new software system technology structure. This structure has become the first choice for today's application software architecture.The system is based on B / S architecture, J2EE technology, using business intelligence technology, portable vaccination technology, successfully resolved the issue of concentration of scattered data, remote synchronization of vaccination should be kind of hard to statistics, children build mobile cards, rural Against the question, "described by vaccination," "false coverage".Application of data warehouse to collect child immunization and disease prevention and control of a variety of data related to the use of data mining tools and OLAP analysis tools for data storage in data analysis and decision-making. BI will be used in child immunization and disease prevention and control management system for managers to provide support for the decision-making. It is because the classification decision tree has many advantages, so this system, this method of research and improvement, combined with the specific data to better study the algorithm to adapt to the large-scale data classification and mining rules. Decision Support module set statistical analysis and prediction in one, better decision-makers to meet the information needs of decision-making.The paper discussed, compared and analyzed several decision tree classification algorithms, and we found a new application background for classification for further research. This paper is to study the theory of decision tree classification algorithm made after the attempt, due to various reasons, the system has not reached the decision support system design requirements, continue to be an in-depth study of the theory of decision tree, to further tap the hidden database Knowledge, to explore better analysis and modeling approach and seek to improve its system, the current development of data mining techniques demonstrated in this area of decision support system design and development of the feasibility and advantage, will certainly be the next step R & D work. This system is at the stage of prime research and trying out, still there are lots of work about making the system more perfect so as to meet the users' need.
Keywords/Search Tags:Data Mining, Classification, Decision Tree, ID3
PDF Full Text Request
Related items