Font Size: a A A

The Research Of Decision Tree Algorithm In Data Mining

Posted on:2003-10-31Degree:MasterType:Thesis
Country:ChinaCandidate:X G HuFull Text:PDF
GTID:2168360092466616Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Data mining, also called as KDD (Knowledge Discovery in Database), is an advanced process, in which we can pick up many trustful, novel, useful and readable patterns from very large amounts of data. Classification is an important problem in data mining. The target of classification is to find out a classification model. Given a database of records, each with a class label, a classification model generates a concise and meaningful description for each class that can be used to classify subsequent records. Classification model can be used to forecast. Classification now has been successfully applied to wide range of application areas, such as medical diagnosis, weather prediction, credit approval, customer segmentation, and fraud detection. Building of a model with good classification ability is of prior importance for practical use. This paper firstly focuses on methods for evaluation of the accuracy of classification model.Many different techniques have been proposed for classification, decision tree classifiers have found the widest applicability in large-scale data mining environments. There are several reasons for this. First, decision trees offer a very intuitive representation that is easy to assimilate and translate to standard database queries. Second, decision tree induction is efficient and is thus suitable for large training sets. Furthermore, decision tree generation algorithms do not require additional information besides that already contained in the training data. Finally, the accuracy of decision tree classifiers is comparable or even superior to that of other classification techniques. This paper secondly focuses on decision tree classifiers.The explosive growth of databases makes the scalability of data-mining techniques increasingly important. Traditionally, algorithms for data analysis assume that the input data contains relatively few records. Current databases, however, are much too large to be held in main memory. Retrieving data from disk is markedly slower than accessing data in RAM. Thus, to be efficient, the data-mining techniques applied to very large databases must be highly scalable. An algorithm is said to be scalable if given a fixed amount of main memory its runtime increases linearly with the number of records in the input database. This paper thirdly focused on scaling decision tree algorithms to very large data sets.Databases can support data mining in terms of convenience and efficiency. Building mining applications over relational databases or OLAP warehouse is nontrivial. It needs different customized data mining algorithms and there is significant work on the part of application builders. OLE DB for Data Mining(OLE DB for DM) is a natural evolution from OLE DB and OLE DB for OLAP. Microsoft's OLE DB for DM specification has been developed to make data mining work for application builders through a single established API. The goal of it is to ease the burden of developing mining applications in large relational databases. Since no internal details of the database are assumed with using OLE DB for DM, the data mining implementations will be more portable across various DBMS. This paper lastly focuses on how to develop decision tree applications with OLE DB for DM...
Keywords/Search Tags:Knowledge discovery in database, Data mining, Classification, Decision tree, scalability of decision tree, OLE DB for DM
PDF Full Text Request
Related items