Font Size: a A A

Research On Hierarchical Gravity Model For Classification Of Imbalanced Datasets

Posted on:2017-09-30Degree:MasterType:Thesis
Country:ChinaCandidate:Z B DongFull Text:PDF
GTID:2348330509953992Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Classification is a topical issue in the field of data mining and machine learning, and traditional classification problem is mainly concerned about data sets with balanced distributions. But in practical applications, data set imbalance occurs frequently. The imbalance of data set brings a lot of difficulties to classification,directly or indirectly. Traditional methods become no longer suitable when it comes to the classification of imbalanced data sets. Due to the challenge of imbalanced data set classification and its wide applications in reality, more and more people are attracted to study on it. At present stage, there are mainly three kinds of solutions to the problem: problem definition layer solutions, data layer solutions and algorithm layer solutions.After analyzing the main problems in imbalanced data set classification, two kinds of traditional classification methods, namely hierarchical classification model and data gravitation model, were introduced. And then we analyzed the feasibility of applying them to imbalanced data set classification problem: A combination of them can reduce the degree of imbalance between classes and the influence of small disjuncts on classification. On the basis of this, a hierarchical gravity model, namely HDGC, which is used for the classification of imbalanced data sets, was proposed. The main work of this paper is as follows:1. Hierarchical classification model and data gravitation model were combined not only to absorb the advantages of these two methods, but at the same time, to make up each other's deficiencies, namely hierarchical classification model's lack of precise classification ability and data gravity model's high classification cost.2. This thesis changed the practice of replacing original samples with new constructed samples in hierarchy classification model, and regarded samples divided into the same hypercube as a data cell. Then every data cell was labelled according to its distance to the boundaries of different classes and the labels were finally used to aid classification.3. A classification process involves local and global gravity was used. On one hand, using only the nearest training samples of the test sample, local gravity improved classification efficiency when there is sufficient information for classification; and on the other global gravity exploited as much information as possible to predict the labels of samples that are difficult to recognize.In order to demonstrate the effectiveness of HDGC, experiments were conducted on some real and synthetic data sets respectively. And experimental results proved that HDGC can handle imbalanced data set classification problem properly, with high classification efficiency.
Keywords/Search Tags:Imbalanced data set classification, Hierarchical classification model, Data gravitation model, Small disjuncts
PDF Full Text Request
Related items