Research Of Decision Tree Algorithm And Its Application On Employment Of Undergraduate Students

Posted on:2013-04-09

Degree:Master

Type:Thesis

Country:China

Candidate:X F Liu

Full Text:PDF

GTID:2248330395986416

Subject:Systems analysis and integration

Abstract/Summary:

PDF Full Text Request

Data mining is the process of extracting implicit and potentially useful information and knowledge from the large amounts of data. We can transform the knowledge into a variety of understandable rules for people to use. Classification is an important content in the research of data mining, which can find a model that describes the data classes and concepts, to predict the class labels of unknown object class.The decision tree algorithm is the most common method in classification algorithm, which has the advantages of rapid calculation speed, easily understood and easily converted into the classification rules, so that it is widely used in our life. But there are many deficiency in the existing decision tree algorithms, such as the handling of vacant value properties, multi-valued bias of attribute selection, the processing method of continuous value property and so on. Therefore, how to further improve the performance of decision trees, to improve their classification accuracy, to make it more suitable for the application requirements of data mining, which has important theoretical and practical significance.In this paper, we deeply study on the decision tree to explore the optimization algorithm of decision tree classification algorithm, and use the decision tree method to classify and mine the data warehouse of graduate students. This thesis mainly research on the applications of the algorithm as follows:First, the thesis has described the basic theoretical of data mining and classification techniques, the basic knowledge of the decision tree, as well as analyzed and compared several typical decision tree algorithm, such as the ID3、C4.5、SLIQ、SPRINT algorithm, which focuses on the ID3and C4.5algorithm.Second, the thesis has given a detailed analysis of vacant value properties, multi-valued bias of attribute selection, the processing method of continuous value property, attribute reduction in the decision tree algorithm, then promoted some optimization methods to solve the problem.Third, most decision tree algorithm requires an accurate value of all data set, because improper handling the missing data can accumulate a large number of errors, which will reduce the performance of the algorithm and increase the computation time and complexity of the subsequent algorithm. This paper has used the similarity principle of the sample to generate predicted values based on the complete data set information to fill in the missing data. It select the data sets to test and compare with other algorithms, in the case of a variety of missing ratio, the optimization method is the best on the efficiency of the algorithm, the correct rate and other aspects.Fourth, this paper has collected the student information data of a college. According to the characteristics of the student repository, this paper has extracted, transformed and loaded (ETL) the heterogeneous data sources to construct student employment data warehouse for decision tree classification mining.Fifth, this paper has used the student employment data warehouse as the analysis data source and combined the optimization algorithm to analyze the problem of college student employment, describe in detail the whole process of decision tree classification techniques in the analysis of college student employment.

Keywords/Search Tags:

Data mining, Decision Tree, Algorithm, ETL, Data Warehouse

PDF Full Text Request

Related items

1	Research And Application Of Decision Tree Algorithm Based On Data Warehouse
2	The Application Of Data Warehouse,OLAP And Data Mining In Management And Decision Of Reserve Officers
3	Based On The Data Warehouse, Data Mining Technology Research And Application In The Real Estate Information Analysis System
4	Data Warehouse Based Generic Multi-Strategy Data Mining Tool-MSMiner
5	Research On Users’ Repurchase Behavior Based On Data Mining And Data Warehouse
6	Research Of Data Warehouse And Data Mining On Telecom Network Resource Management System
7	Application On The Techniques Of Data Warehouse And Data Mining In CRM Of The Bank
8	To Achieve The Design Of Data Warehouse And Data Mining Technology In The Banking Industry
9	Application For Data Mining Based On Decision-tree Algorithm
10	Based On The Data Warehouse Qinghai Statistics Decision Analysis Support System Design And Implementation