Font Size: a A A

Cost-sensitive Hierarchical Classification Method For Imbalanced Data

Posted on:2022-05-05Degree:MasterType:Thesis
Country:ChinaCandidate:W J ZhengFull Text:PDF
GTID:2518306485950229Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The era of big data has brought about the exponential growth of information.In reality,all walks of life have accumulated huge amounts of data.Due to the differences in the difficulty of data collection and the frequency of events,the number of samples in each class is different,which leads to the imbalanced distribution problem of classes.The imbalanced distribution problem of classes leads to the decline of classification accuracy of traditional machine learning classifiers.The existing cost-sensitive learning methods can solve the imbalanced distribution problem of classes.However,hundreds of classes contain the structure relationship between classes.The imbalanced distribution of samples with hierarchical relationship in data brings great challenges to machine learning classification task:(1)the low accuracy of minority classes reduces the overall classification accuracy;(2)in a hierarchical classification task,the classification error between levels causes the error to pass down to the next level of subtasks.In this thesis,we focus on the task of imbalanced distribution with hierarchical structure by studying class correlation and hierarchical structure information.(1)Cost-sensitive hierarchical classification method based on hierarchical correlation of classes.In the traditional hierarchical classification method,the hyperplane tends to favor the majority classes in the dataset and despises the equally important minority classes.In the process of classification,the idea of divide and conquer is used to transform a task into several small-scale subtasks,and the task is decomposed by each category between levels.Then,the cost-sensitive parameters are established according to the proportion difference of classes in the dataset,and the relevant judgment threshold is set to give different cost weights to different levels.Finally,cost-sensitive hierarchical classification method based on hierarchical correlation of classes is proposed.(2)Cost-sensitive hierarchical classification via multi-scale information entropy for data with an imbalanced distribution.The traditional hierarchical classification algorithms ignore the differences of information and the number of classes reflected in the majority classes and minority classes.In this thesis,we calculate the information entropy of each class in each level,use the information entropy to establish a threshold constraint strategy to prevent error down to the next level,and establish a cost-sensitive function based on hierarchical information entropy and class proportion difference.Finally,a cost-sensitive hierarchical classification via multi-scale information entropy for data with an imbalanced distribution.
Keywords/Search Tags:Hierarchical classification, imbalanced classes, information entropy, threshold strategy, cost-sensitive
PDF Full Text Request
Related items