Font Size: a A A

Nearest Neighborhood-Based Rare Category Mining

Posted on:2013-10-21Degree:DoctorType:Dissertation
Country:ChinaCandidate:H HuangFull Text:PDF
GTID:1228330395489252Subject:Computer applications
Abstract/Summary:PDF Full Text Request
In many real-world problems, we usually need to handle with imbalanced data sets, in which the majority categories (majority classes) occupy the vast majority of the data sets and the rare categories (rare classes) have only a few examples. Compared with majority categories, the rare categories are often more interesting to us. For example, in network intrusion detection, among the huge-volume routine traffic, there are a few malicious network activities; in financial security, the most of financial thransactions are legitimate although, there are still several fraudulent and harmful ones. Therefore, how to mining out these rare categories hidden in the data sets is of high value in the theory and the practice.In the literature, the tasks of rare category mining can be classified into two groups, namely (1) discovering at least one example from each rare category as the supporting evidence of the existing of these rare categories;(2) finding out all examples of each rare category with the purpose of better understanding the nature of these rare categories. Usually, the first task of rare category mining is called rare category detection, which can be further divided into two subtasks, namly prior-based and prior-free rare category detection; while the second task of rare category mining can be solved by three types of techniques, namely rare category classification, rare category clustering, and our proposed rare category exploration.Focusing on the two tasks of rare category mining, this paper studies the problems of prior-based and prior-free rare category detection and, for the first time, proposes a novel rare category problem known as rare category exploration with our corresponding mining algorithms. The contributions of this paper are as follows.(1) For the problem of prior-based rare category detection, we propose the first density-insensitive approach, namely RADAR. It helps us detect the boundaries of rare classes by utilizing reverse k-nearest neighbors, and then discovers the examples of rare classes. Extensive experimental results demonstrate that compared with the exsiting approaches, RADAR is more suitable to handle with data sets containing multi-density rare classes because of its density-insensitive property. Furthermore, we propose an extention work of RADAR, known as CATION. In the extention work, in order to increse the probability of discovering examples from rare classes, motivated by the change in the number of reverse k-nearest neighbors of the examples around the boundaries of rare classes, we redesign the method of boundary point selection for rare classes to select rare-class boundary points which are more close to the inner region of each rare class. Extensive experimental results verify that CATION significantly improved the performance of rare category detection, compared against the exsiting approaches.(2) For the problem of prior-free rare category detection, considering that its existing solutions bear high time complexity, we propose a faster solution, namely CLOVER. It can identify rare classes from the other types of data by utilizing mutual k-nearest neighbors. Extensive experimental results demonstrate that, compared with existing approaches, CLOVER reasonably reduces the runtime and, in particular, has a significantly better performance of rare category detection.(3) For the second task of rare category mining, i.e., finding out all examples of each rare class, due to the restrictions and limitations of rare category classification and rare category clustering, we propose, for the first time, the problem of rare category exploration, i.e., exploring a data set from a given rare-class example with the purpose of correctly revealing the remaining examples of the same rare class. By proposing rare category exploration, there is a successive scenario between the two tasks of rare category mining. Specifically, rare category detection helps discover an example from each rare class, and rare category exploration then finds out the remaining examples of each rare class based on the discovered one.(4) For the problem of rare category exploration, we propose a simple, yet effective solution, namely FRANK. It constructs a k-nearest neighbors graph with the given data set, and thus converts the problem of rare category exploration into that of finding a local community in the graph containing all examples of the objective rare class from a starting vertex. The local community then can be detected via a greedy strategy. Extensive experimental results demonstrate that, compared with existing approaches to rare category classification and rare category clustering, FRANK has much better accuracy and recall in terms of rare class discovery.
Keywords/Search Tags:Rare Category Detection, Rare Category Exploration, NearestNeighborhood, Reverse k-Nearest Neighbors, Mutual k-Nearest Neighbors, k-NearestNeighbors Graph, Prior Knowledge
PDF Full Text Request
Related items