Font Size: a A A

Research On Class Imbalance Problem In Distance Metric Learning

Posted on:2017-10-16Degree:MasterType:Thesis
Country:ChinaCandidate:J T LiuFull Text:PDF
GTID:2348330491963019Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Distance metric learning learns the distance metric between examples and provides more reliable basis to measure the similarity between different examples. Many data mining algorithms, such as k-nearest neighbor, hierarchical clustering heavily rely on the underlying distance metric for correctly measuring relations among input data, so distance metric learning is a fundamental problem in machine learning and data mining. Class imbalance problem is very common in the real world where some classes have much more examples than the other classes and the minority classes are more important. Distance metric learning algorithms often optimize some loss function to learn the metric. When the input data is class imbalanced, the minority classes have fewer examples while they are more important. A distance metric learning algorithm cannot learn a good metric for minority classes if does not pay more attention to them since the loss of the majority classes dominate. As far as we know, there is no work in current distance metric learning considering how to learn the metric when the input data is imbalanced.This paper studies distance metric learning when the input data is class imbalanced, and achieves the following results:(1) IMLMNN algorithm is proposed based on LMNN to solve the class imbalance problem in distance metric learning. The algorithm gives examples in different classes different weights such that the weights of the examples in the minority class is more bigger than those in the majority classes. This makes the algorithm pay more attention to the minority classes and the distance metric of the minority class are more precise since the increased loss of the minority classes make them have more influence in the learning process.(2) When the data is class imbalanced in classification tasks, it needs to be considered not only when learning the metric but also classifying the examples. This paper uses IMKNN to handle class imbalance problem in classification phrase.The algorithm can also be regarded as a new method to solve the class imbalance problem, and it can solve both two-class and multi-class classification.Experiments on many UCI data sets show significant improvements of G-mean, F-measure and AUC comparing with kNN?PNN?IMKNN?LMNN, also the experiments show its effectiveness when comparing with the most popular algorithms designed to solve the class imbalance problem.
Keywords/Search Tags:Machine learning, Distance metric learning, Class imbalance, KNN, SDP
PDF Full Text Request
Related items