Font Size: a A A

The Research Of Class Imbalance Classification Model In Data Mining

Posted on:2015-09-22Degree:MasterType:Thesis
Country:ChinaCandidate:K LiuFull Text:PDF
GTID:2298330467475481Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Many real life data is uneven distribution, some categories have small proportionof the total sample,and some categories have a large proportion of the total sample.Minority class sample in which is often a focus of research, they are relatively largerimpact on our people pay more attention to the minority class samples weremisclassified its loss brought. However, use the traditional classification algorithm onthe class imbalance dataset directly is not ideal, especially for the minority classclassification accuracy, the effect is even worse. Therefore, the study of classimbalance data set classification problems have a very important practicalsignificance. This paper presents an algorithm based on improved SMOTE andrandom forest classification algorithm combining the classification model, theimproved algorithm at the level of use proposed NSMOTE data processing algorithmsclass imbalance data sets. In the algorithm level, choose a combination of decisiontree algorithms for processing after random forest to classify data sets. Compared withthe traditional classification, the classification can be preferably used in theunbalanced dataset class.Our country is a resource-based country,"more coal, lack of oil, less gas" is anoverview of the status of the country’s resources.Obviously, the coal resources ofChina’s economic development is very important, coal mine production safety isrelated to a colossaltask.Application of advanced data mining techniques to predictthe danger zone in coal can reduce the incidence of sudden disasters and Reduce theloss of life and property of the people.Because of the danger zone in the coal mine is asmall probability event, the research belongs to the class imbalance data problem. Theproposed class imbalance classification model used to analyze the data set mine data,and provide reasonable suggestions for coal production, so we can avoid disasters tosome extent.
Keywords/Search Tags:data mining, class imbalance data set, classificationmodel, coal mine
PDF Full Text Request
Related items