Font Size: a A A

A Reseach For Imbalanced Data Classifi-cation Algorithm Based On Neighborhood Rough Set And Hypernetwork

Posted on:2016-06-01Degree:MasterType:Thesis
Country:ChinaCandidate:X LiuFull Text:PDF
GTID:2298330452467708Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Imbalanced data classification is one of the hot issues in data mining and machinelearning.In the imbalanced dataset, the classes distribution of numbers is imbalanced.Imbalance means that the class of the data is not uniformly distributed and some class hasmuch more samples than the others. Because of the peculiarity of imbalanced dataset,when we use traditional machine learning methods to classify the imbalanced dataset, theresult will tend to majority class, so we have a bad accuracy on minority class. Toimprove the accuracy of minority class, researchers proposed different methods fromdifferent levels. Hypernetwork is a pattern classification method which has the advantageof readable and simple implemention.Hypernetwork is a probabilistic graphic model inspired by molecular networks in thecell. It is a kind of rule-based classifier for memory and learning. Hypernetwork consistsof lots of hyperedges. And hyperedge can connect more than two vertices. It is the specialhyperedge structure of hypernetwork that make it able to represent the relationship amongfeature attributes effectively. However, hypernetwork is to learn from the dataset byhyperedge. In the learning process, as the peculiarities of imbalanced data, the learnedhyperedges will also tend to majority class. So, to increase the amount of minority class inhypernetwork or to enhance weight of minority class hyperedge is the point of research inthis thesis.The neighborhood rough set model was proposed by applying rough set theory toneighborhood system. Because the neighborhood rough set model is based on samplesand its radius, it can easily get the distribution of whole imbalanced dataset. If we canapply it on hypernetwork model, we can get a better performance hypernetwork model ofimbalanced data. In this thesis, based on neighborhood rough set model, we carry out theresearch on imbalance data classication based on neighborhood hypergraph model. First,we use neighborhood rough set model to calculate the radius of each sample.On this basis, we make each sample generate constant hyperedges and confirm classof hyperedge by the neighborhood of samples. Then, based on indiscernible relation ofneighborhood rough set, we part the hyperedges into four parts: upper approximation,lower approximation, boundary region and negative domains. Except the hyperedges inlower approximation and a part of hyperedges in boundary region, the others will be replaced by the new generated hyperedges. At last, we use the remained hyperedges toclassify the testing dataset.Based on neighborhood hypergraph and combined the cost-sebsitive algorithm, wepropose another classification algorithm: cost sensitive classication algorithm based onneighborhood hypergraph. At last, we do our experiment respectively and analyze theresult by comparing it with other algorithm. The result shows our methods make a betterperformance on imbalanced data classification.
Keywords/Search Tags:hypernetwork model, imbalanced data classification, neighborhoodrough set model, cost-sensitive learning
PDF Full Text Request
Related items