Font Size: a A A

For The Non-equilibrium Hybrid Data Classification And Its Application

Posted on:2009-04-04Degree:MasterType:Thesis
Country:ChinaCandidate:Y Z ChenFull Text:PDF
GTID:2208360245483026Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The processing of the imbalanced mixed data is very commom in the real world, Such data are unevenly distributed, and diversity of attributes. The effectiveness of traditional classification learning methods is not high in dealing with this type of data, and if the minor samples is sufficiently important, it may lead to greater losses. So against non-equilibrium mixed data processing methods have become one of the focal point of the current domestic and international data mining research.The main research work of this paper is on the basis of traditional classification methods, through improving the traditional methods, achieve non-equilibrium mixed data processing. It was found that k-nearest neightbours by counting can be effective in the mixed data classification by analyzing the algorithm, but the effectiveness of the algorithm are not satisfactory for non-equilibrium data processing. So this paper proposes three improved classifying methods by combining the characteristics of imbalanced data with CwkNN algorithm, were as follows:(1) The overall density classification algorithm: Against the characteristics of the CwkNN algorithm can not handling non-equilibrium data, the introduction of a overall density, re-balancing of data on the impact of the classification. It was found that the minor samples increase the accuracy of the classification, and the majority samples reduce the classification accuracy through experiments.(2) K—local density classification algorithm:Aim at the overall density classification algorithm reducing the classification accuracy of the majority samples, the introduction of a K—local density to ensure that the minor samples will improve the accuracy of classification, and the majority samples will not reduce the classification accuracy at the same time. It was found that the effective increase in imbalanced type of data classification accuracy through experiments.(3) The boundary points detection and classification algorithms based on the density: Aim at the boundary points in the data, the paper proposed a boundary points detection method based on the density, and use the three kind of classification methods of boundary points to classify boundary points detected. Experiment prove that these method can classify the non-equilibrium data with boundary points correctly.
Keywords/Search Tags:k-nearest neightbours by counting, non-balanced data, overall density, k-local density, boundary point detect
PDF Full Text Request
Related items