Font Size: a A A

Research And Application Of Two Typical Classification Algorithms In Data Mining

Posted on:2019-05-13Degree:MasterType:Thesis
Country:ChinaCandidate:Y H SongFull Text:PDF
GTID:2428330548469804Subject:Probability theory and mathematical statistics
Abstract/Summary:PDF Full Text Request
Classification is an important method in data mining which a classification function or classification model is constructed to find the exact category for the data or object to be classified based on given data.This paper has done the following major work with regard to the study of classification:A multi-parameter kNN classification algorithm is proposed in this paper based on the problem of low classification accuracy when kNN algorithm is used to classify multi-parameters.For quantitative data,this paper builds a comprehensive index model and combines with examples of hydrological drought,and the results show that it is possible to accurately classify drought,normal or flooded species;For data with fuzzy information,such as based on the interval data,a method for constructing the closeness degree of the interval is proposed and the medical diagnosis data is used to verify the case.The results show that this method can accurately diagnose the patient's condition.For the classical C4.5 classification algorithm,the correlation between the condition attribute and the decision attribute is not considered when selecting the split attributes.This paper proposes a CP-C4.5 classification algorithm based on the correlation coefficient and PCA.Firstly,according to the principle of the correlation coefficient,the correlation coefficient between the condition attribute and the decision attribute is obtained,and the importance of the condition attribute to the classification is determined.Secondly,the PCA algorithm is used to eliminate the influence of the correlation between the attributes.Finally,the experimental comparison of the algorithm using UCI data sets shows that the proposed CP-C4.5 algorithm improves the classification speed under the premise of ensuring the classification accuracy.In summary,this article first introduced the basic theoretical knowledge related to classification.Second,we study the kNN classification algorithm and the C4.5 decision tree classification algorithm in data mining.Focusing on the two typical classification algorithms in data mining,some improvements are proposed,and examples are used to verify the rationality and effectiveness of the improved method.However,on the distance metrics in the classification algorithm,can we define a new metric to improve the accuracy of classification? The issue of speed and speed needs further study.
Keywords/Search Tags:Classification, Data mining, Similarity measure, kNN algorithm, Correlation, C4.5 algorithm
PDF Full Text Request
Related items