The Decision Tree Algorithm Based On Large Databases And Implementation

Posted on:2008-05-05

Degree:Master

Type:Thesis

Country:China

Candidate:H Chang

Full Text:PDF

GTID:2208360215966878

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Classification is one of important basic tasks in the field of datamining and machine learning. It can be used to analyze and study a vast number of related data and establish classifying models in many areas of related problems. The classification techniques haveextensive application usage in scientific research, communication, finance and other fields. A decision-tree classifier is a very important model in the process of knowledge discovery. Good interpretability, fast classification speed and excellent classification performance of decision-tree make it gradually become the research focus in the fields of data mining and machine learning.The most classical decision-tree learning system is ID3, which use the divide-and-conquer approach to decision-tree induction from root to leaves, and choose the spliting attributes by the information gain. This method can ensure to construct a simple tree. But ID3 can not handle numeric attributes, only nominal attributes. It is usually overfited to the training databases. C4.5 algorithm is the extension of ID3. It extends the classification ability of ID3 from nominal attributes to numeric attributes. It well resolves the problem about overfiting by pruning decision-trees. Now it has already been known as a beter decision-tree classifier.In a real application, we build decision tree which is based on large database with massive data. How to integrate the building of decision tree with database technology is a problom worth to research, so, many previous algorithm are studied and extended over again.Thispaper focuses on the study of scalable classification algorithm that tightly integrates the building of decision tree with database technology. We use SQL to realize the computation data pre-processing and attribute selection measure, and store dicision tree in relational database. In this paper, not only training set Used in building dicision tree but also the subset of training set is defined by view; In the procedure of building tree, the main compution task is realized with standard database system language SQL. The classification algorithm based view make use of the processing capacity of large database and easily realized. At the end of paper, examination was designed based on KDD CUP 2004 data. The data was loaded in relational database and preprocessed with SQL, and dicision tree was builded and stored in database. By the examination, it is proved that building dicision tree with the processing capacity of large database is available and efficient.

Keywords/Search Tags:

data mining, dicision tree, view, SQL

PDF Full Text Request

Related items

1	The Research And Implementation Of Tree-Based Data Mining Algorithm
2	Research Of Data Mining Technique Based On Complex Structure
3	Research And Development On Decision Support System Of Real Estate Based On Data Mining
4	Data Mining Technology Applied Research, In The Center Of Television News
5	Freight Invoice Based On Decision Tree Data Mining System
6	The Research Of Decision Tree Algorithm In Data Mining
7	Studies On Algorithms Of Association Rule Mining In Data Mining
8	Study On Association Rules Mining Algorithm Based On FP-tree
9	Frequent Subtree Mining Application In Xml Mining
10	The Research Of Data Mining In Mobile Communication Enterprise Based On Decision Tree