Font Size: a A A

Research Of Decision Tree Algorithm And Its Application On Employment Of Undergraduate Students

Posted on:2013-04-09Degree:MasterType:Thesis
Country:ChinaCandidate:X F LiuFull Text:PDF
GTID:2248330395986416Subject:Systems analysis and integration
Abstract/Summary:PDF Full Text Request
Data mining is the process of extracting implicit and potentially useful information and knowledge from the large amounts of data. We can transform the knowledge into a variety of understandable rules for people to use. Classification is an important content in the research of data mining, which can find a model that describes the data classes and concepts, to predict the class labels of unknown object class.The decision tree algorithm is the most common method in classification algorithm, which has the advantages of rapid calculation speed, easily understood and easily converted into the classification rules, so that it is widely used in our life. But there are many deficiency in the existing decision tree algorithms, such as the handling of vacant value properties, multi-valued bias of attribute selection, the processing method of continuous value property and so on. Therefore, how to further improve the performance of decision trees, to improve their classification accuracy, to make it more suitable for the application requirements of data mining, which has important theoretical and practical significance.In this paper, we deeply study on the decision tree to explore the optimization algorithm of decision tree classification algorithm, and use the decision tree method to classify and mine the data warehouse of graduate students. This thesis mainly research on the applications of the algorithm as follows:First, the thesis has described the basic theoretical of data mining and classification techniques, the basic knowledge of the decision tree, as well as analyzed and compared several typical decision tree algorithm, such as the ID3、C4.5、SLIQ、SPRINT algorithm, which focuses on the ID3and C4.5algorithm.Second, the thesis has given a detailed analysis of vacant value properties, multi-valued bias of attribute selection, the processing method of continuous value property, attribute reduction in the decision tree algorithm, then promoted some optimization methods to solve the problem.Third, most decision tree algorithm requires an accurate value of all data set, because improper handling the missing data can accumulate a large number of errors, which will reduce the performance of the algorithm and increase the computation time and complexity of the subsequent algorithm. This paper has used the similarity principle of the sample to generate predicted values based on the complete data set information to fill in the missing data. It select the data sets to test and compare with other algorithms, in the case of a variety of missing ratio, the optimization method is the best on the efficiency of the algorithm, the correct rate and other aspects.Fourth, this paper has collected the student information data of a college. According to the characteristics of the student repository, this paper has extracted, transformed and loaded (ETL) the heterogeneous data sources to construct student employment data warehouse for decision tree classification mining.Fifth, this paper has used the student employment data warehouse as the analysis data source and combined the optimization algorithm to analyze the problem of college student employment, describe in detail the whole process of decision tree classification techniques in the analysis of college student employment.
Keywords/Search Tags:Data mining, Decision Tree, Algorithm, ETL, Data Warehouse
PDF Full Text Request
Related items