Font Size: a A A

The Research And Application Of The Improved C4.5 Algorithm In The Analysis Of College Students' Emotional Quality

Posted on:2019-05-25Degree:MasterType:Thesis
Country:ChinaCandidate:L F MiaoFull Text:PDF
GTID:2357330548955582Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Decision tree C4.5 algorithm is a classification algorithm of data mining.Because the idea of algorithm is simple and the classification rules are easy to extracted and easily understand,it has been widely used.The traditional C4.5 algorithm has a good classification effect based on the relative balance of each category sample in the data set.However,based on unbalanced data set,because there are few samples of a few classes,the classifier cannot provide enough classification information,and in order to improve the overall classification accuracy of the algorithm,the classifier tends to pay more attention to the majority class and ignore the minority class classification.This leads to the phenomenon that when the C4.5 algorithm classifies unbalanced data sets,although the overall classification accuracy is high,the classification accuracy of a few classes is very low.In this regard,this paper analyzes the current research status and related solutions at home and abroad.At the same time,based on the previous views and experiences,an improved C4.5 algorithm is proposed,which is called MR_C4.5 algorithm.Firstly,the algorithm calculates the interval between the maximum and minimum values of a few classes on each attribute,and calls it a minority class range.Then,taking each attribute as a splitting point,we calculate its information gain rate on the corresponding minority class range.Finally,by comparing the information gain rate of each split point,we choose the split point with the maximum information gain rate as the real splitting point to create decision tree.In fact,the MR_C4.5 algorithm mainly improves the accuracy of minority class classification by reducing sample instances of most classes outside the minority class range.At the same time,the decision tree C4.5 algorithm was applied to the analysis of affective diatheses in contemporary college students.According to the unbalanced distribution of college students' affective diatheses data,the classification performance of the improved MR_C4.5 algorithm was discussed.In addition,taking into account the accuracy of the model prediction,the simplicity of the description and avoiding the phenomenon of over-fitting,before the establishment of the decision tree,the data preprocessing of the college students' affective diatheses data set was first performed,including cleaning,transformation,protocol,and multivariate use.The data was regulated using the significance test of the multiple linear regression model.This paper designs three groups of experiments and discusses the application of the improved C4.5 algorithm in the analysis of college students' affective diatheses.The results show that:(1)C4.5 algorithm can be used in the analysis of college students' emotional quality and the classification effect is better,but the effect on the classification of a few classes is not well;(2)Reasonable use of data preprocessing can effectively improve the overall performance of the decision tree model.(3)The MR_C4.5 algorithm can handle unbalanced data sets better than the traditional C4.5 algorithm.In the “A” classification with the smallest sample size,the F-measure increases by an average of about 9%.
Keywords/Search Tags:decision tree, C4.5, data preprocessing, data imbalance, college students' affective diatheses
PDF Full Text Request
Related items