Font Size: a A A

Research On Nonparametric Frontier-Based Classification Methods

Posted on:2022-08-11Degree:DoctorType:Dissertation
Country:ChinaCandidate:Q Y JinFull Text:PDF
GTID:1488306731467254Subject:Materials Science and Engineering
Abstract/Summary:PDF Full Text Request
Classification problems are widely found in different fields,such as pattern recognition,medical disease diagnosis,bankruptcy prediction,credit rating,etc.Data Envelopment Analysis(DEA)frontier-based classification methods use the piece-wise linear frontiers to distinguish different groups of sample data,and the benchmark in DEA models makes the DEA frontier-based classification methods have a strong ability to explain the classification results.However,there are still some shortcomings in the theory and practice of DEA frontier-based classification methods in existing studies,mainly in that(1)most of them retain the convex assumption proposed in the production context,and the meaning of this assumption in the classification context is not clear;(2)they are usually limited to portraying monotonic attributes,and data pre-processing is required for common non-monotonic attributes;(3)they are usually constructed based on radial measures and have not yet paid attention to the impact of measure selection on classification results;(4)the impact of asymmetric misclassification cost on frontier construction,especially the design of classification rules,is less considered.In view of this,this paper investigates the nonparametric frontierbased classification methods from the aspects of measure selection,frontier construction,classification rule design and result interpretability,combining with the actual classification problem.The main research contents are as follows.First,a classification method based on a single nonconvex frontier is proposed to address the problem of non-compensability between monotonic attributes.The convex assumption in the DEA method reflects the compensable relationship between attributes in the classification context.In practical applications,this assumption can be relaxed when the attributes are not compensable with each other,and a non-convex classification frontier based on the Free Disposal Hull(FDH)model is constructed.In order to directly use the original sample data,the Minimize the Sum of the Deviations(MSD)model is proposed to discriminate the monotonicity of attributes relative to the classification results.The classification rules are proposed to be designed based on the Directional Distance Function(DDF)measure.In this case,the decision maker can choose different projection directions according to their demand,and reverse classification is possible with the benchmarks obtained.The single frontier classification method constructed in this paper is compared with the DEA-Discriminant Analysis(DA)methods and common classification models in a small sample case.The results show that relaxing convexity assumption can improve the classification performance of the frontier-based classification model,and the single frontier classification method shows some superiority over the existing classification models and is less affected by the imbalanced data.Secondly,the classification method based on a single hull is proposed for the attributes with non-monotonic relationship with the classification results.The classification method based on a single frontier is suitable for portraying monotonic attributes,and in the actual classification,not all attributes satisfy the strict monotonic relationship.For example,in bankruptcy prediction,too large or too small asset-liability ratio implies an increase in the probability of bankruptcy.Therefore,based on the S-disposability in DEA blocking studies,general disposability is constructed to measure the expected range of values of nonmonotonic attributes.Multiple frontiers are constructed under different preference direction vectors,which together form the hulls that delineate different categories.Depending on whether the attributes are compensable,convex and nonconvex hulls can be constructed based on DEA and FDH,respectively,and a preference adaptive direction distance function is proposed to measure the relative distances of sample points to different frontiers.The case study shows that the non-convex hull not only performs better on the training data but also has better prediction performance compared with the convex hull classification method.The impact of imbalanced data on the hull classification methods is less than that of existing classification models.Double-frontier classification methods are constructed for decision makers who seek overall classification accuracies.A single-frontier classification method has good performance when the decision maker has a clear preference for positive or negative categories.However,when the decision maker has no preference for certain group and pursues the overall classification performance,the double-frontier classification method is constructed to make full use of the two groups of sample data.According to whether the attributes are compensable among each other,convex frontier and non-convex frontier can be constructed based on DEA and FDH,respectively.We propose to design classification rules based on the DDF measure results,and find that the measure selection has an important influence on the classification results of fuzzy regions through case studies.For the classification problem with asymmetric misclassification cost,an algorithm is designed to minimize the total misclassification cost,and a cost-sensitive frontier is constructed by allowing the frontier to be shifted inward.The influence of the asymmetric misclassification cost is also reflected in the design of the classification rules.The empirical results of a bank credit card default data show that the double-frontier classification method has good prediction performance under different samples sizes.Especially.their overall performance is better than the existing classification models under small samples and unbalanced data sets.Finally,the double-hull classification method is constructed for decision makers who pursue the overall classification accuracy under the problem with non-monotonic attributes.The positive class hull is the intersection of frontiers under different preference direction vectors,and the negative class hull is the union set of corresponding frontiers.After dividing the negative class samples based on the positive hull,the negative hulls are constructed based on the subsets of the negative class samples.In addition,in the multivariate classification problem,even though all attributes are monotonic,their multiplicity determines that the attribute values of each class present non-monotonicity.It is then proposed to use multiple hulls to achieve multivariate classification.The empirical results of credit card default data of a bank show that it is necessary and effective to construct a hull classification model for non-monotonic data.In general,the hull classification method has better prediction performance compared with the existing classification models.The advantage is more obvious under the unbalanced data set.As the difference in sample size increases,the hull classification model still maintains a high accuracy in predicting the small-sample group.
Keywords/Search Tags:Classification, Data Envelopment Analysis, Frontier, Nonconvexity, Non-monotonic
PDF Full Text Request
Related items