Font Size: a A A

The Study On Ordered Multi-Class Receiver Operating Characteristic(Hyper) Surface

Posted on:2021-03-03Degree:DoctorType:Dissertation
Country:ChinaCandidate:S LiuFull Text:PDF
GTID:1368330602493448Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
The Receiver Operating Characteristic(ROC)analyze framework was first investigated to analysis the two-class statistical models in the fields of science and engineering during World War II.In essence,ROC analysis belongs to a supervised analysis method with respect to the performance of statistical models,which needs to obtain the output of the statistical models of two-class samples or their corresponding cumulative distribution functions in advance.However,in recent years,with the rapid development of science and technology and the increasing demand of human being,ordered three-class or even multi-class tasks,i.e.problems with ordered levels or degrees of relationship between different categories are frequently encountered in practice.As a result,the traditional ROC analysis is no longer applicable to these missions,especially in the areas of machine learning,biomedicine,and signal processing,among others.Three(Multi)-class ROC analysis is also a supervised analysis method that requires the output of the statistical models in advance,but different from the ROC curve,which is only suitable for the analysis of two-class problems,three(multi)-class ROC analysis describes the classifier by means of a(Hyper)surface formed by multiple sets of True Class Fractions(hereafter referred as TCF)according to various sets of thresholds.As an extension of the area under the ROC curve(AUC)in high-dimensional space,the(Hyper)Volume Under the Surface(hereafter referred to as(H)VUS)can be considered as a figure-of-merit to evaluate the performance of a three(multi)-class statistical models,that is,the ability of the models to effectively distinguish each category in a specific task.From the perspective of reliability and robustness,this kind of metrics for evaluating the performance of statistical models has been widely regarded superior to other traditional methods,such as accuracy.However,on the issue of estimating this type of metrics,most of the research is focused on the problem of estimating AUC under continuous sample conditions.The research on the topic of estimating AUC and its variance under discrete samples,estimating VUS and the corresponding variance with continuous or discrete samples is relatively scarce.All of the existing algorithms are suffered from high order time complexity or deviation.In addition,our research indicates that the null distributions of(H)VUS's estimator with small sample sizes are far from normal distribution,the popular method of using normal distribution to perform null hypothesis test will give a misleading result to great extend.Motivated by such unsatisfactory situation,this dissertation mainly focuses on non-parametric analysis,combining with numerical analysis,mathematical statistics,combinatorial mathematics and other mathematical tools to perform theoretical analysis and algorithms design of ordered ROC analysis,and proposes a relatively complete framework for ordered multi-class ROC analysis.The main contributions of this dissertation can be summarized as follows:1.With respect to binary problems,based on the equivalent relationship between Mann Whitney U statistic(MWUS)and AUC,we derive an unbiased estimation of the variance of the AUC's estimator for discrete samples,and a dynamic programming structure is employed to accelerate it.The time complexity of the improved algorithm is reduced from cubic order to linearithmic order.2.For the trichotomous task,1)Propose an unbiased algorithm for estimating the variance of the estimator of VUS for samples under continuous conditions,and use dynamic programming to reduce its computational time complexity from the original quintic order to linearithmic order.The developed algorithm is better than the state of the art(hereinafter referred to as SOTA)in terms of unbiasedness and computational efficiency according to the simulation.2)Develop a fast and unbiased algorithm based on rank to estimate VUS,and then further improve the recursive algorithm for computing the exact null distribution of the estimator of VUS.Based on the asymptotic normality of this estimator,a Gaussian distribution with fixed mean and accurate expression of variance is obtained,which can be used to approximate the null distribution of it where the sample sizes are large enough.Monte Carlo simulation and experiment with real-world dataset verify our theoretical and algorithmic findings in this part.3)Induce an unbiased expression of the variance of the estimator of VUS for discrete samples,and also apply aforementioned computing structure to improve its computational efficiency,which fills the blank about this problem in the field of discrete ROC analysis.3.With respect to multi-class problems,1)An unbiased estimation of the variance of the estimator of HVUS for continuous observations is proposed and then proved by mathematical induction.Grounded at the equivalent relationship between the linear combination of basic events and the estimation for the variance of the estimator of HVUS,we employ the properties of Dyck path and dynamic programming structure to optimize this estimator,and then through comparison with the graph-based algorithm in terms of computational efficiency,we find the critical conditions of choosing these two algorithms in practice.2)A recursive algorithm for computing exact distribution of the estimator of HVUS under null hypothesis is proposed.Based on the asymptotic normality of this estimator,a normal distribution with fix mean and accurate computing method of variance is developed to approximate its null distribution,which can be considered as an alternative method when the sample sizes are large enough.Theoretical findings and developed algorithmic in this dissertation might become a milestone on the topic of multi-class ROC analysis,especially in the context of machine learning,biomedical science and military technology.
Keywords/Search Tags:Receiver Operating Characteristic(ROC), Multi-class, (Hyper)Volume Under the Surface((H)VUS), Non-parametric Method, Dynamic Programming, Unbiased Estimation, Fast Algorithm
PDF Full Text Request
Related items