Font Size: a A A

Research And Application Of Decision Tree Algorithm In Employment Information Warehouse Of Graduate Student

Posted on:2011-04-14Degree:MasterType:Thesis
Country:ChinaCandidate:Q WangFull Text:PDF
GTID:2178360305983009Subject:Information management and information systems
Abstract/Summary:PDF Full Text Request
Data mining is the results of rapid development of information technology and the diversify of people's access to data means.It is the process of extracting implicit and potentially useful information and knowledge from the large amounts of data. The main task of data mining includes correlation analysis, cluster analysis, classification, prediction, time-series pattern, deviation analysis and so on. In the process of mining, data classification is an important content in the research of data mining. At present, there are many methods used for data classification, such as decision trees, neural networks, k-nn method, rough sets, and statistical models and so on. Among them, the decision tree algorithm is the most common method in classification algorithm. It has the advantages of rapid calculation speed, easily understood and easily converted into the classification rules Therefore, it is widely used in medical diagnosis, weather reports, credit audits, business prediction, cases detection and other fields.There are many deficiency in the existing decision tree algorithms, such as multi-valued bias of attribute selection, the treatment of property vacancy value, the treatment of property continuous value and so on. Therefore, it has important theoretical and practical significance about how to further improve the performance of decision trees, improve their classification accuracy, and make it more suitable for the application requirements of data mining. In this paper, the author has made further research about the deficiency of the above-mentioned decision tree, explored the optimization algorithm of the decision tree classification algorithm, as well as how to use the decision tree method to classify and mine the data warehouse of graduate students. In this paper, the author's main research works are as follows:First, the author has described the theoretical basis of data mining and classification techniques, and the basic knowledge of the decision tree, as well as mainly analyzed and compared some common decision tree algorithm, such as the classical decision tree algorithms-ID3 algorithm, C4.5 algorithm which is able to overcome the value bias problem of ID3 algorithm properties, CART algorithm using gini coefficient as an attribute selection criteria, and SLIQ algorithm with good expansion and parallelism.Second, the author has analyzed the following issues in details, such as the vacancy of property value, the bias of property continuous value, the treatment of continuous-valued property, the reduction of property, and the standard of property selection in the existed decision tree algorithm, put forward some concrete solutions as well.Third, according to the characteristics of information database in college graduate students, the author has extracted, transformed, and loaded the heterogeneous data sources, built up the post-graduate employment data warehouse for the classification mining.Fourth, the author has improved the ID3 algorithm and put forward a new decision tree algorithm based on users'interest degree and simplified information entropy. By comparison, the new algorithm is better than traditional ID3 algorithm in the overall performance. The improved algorithm is applied to employment information database in college graduate students, provides decision support for the career center of university, and effectively plays its role in the practical application value.
Keywords/Search Tags:Data mining, Decision tree, ID3 algorithm, Entropy of simplification, User interest rate
PDF Full Text Request
Related items