Font Size: a A A

Diabetes Risk Assessment Based On Finite Mixture Model

Posted on:2019-07-23Degree:DoctorType:Dissertation
Country:ChinaCandidate:L F HanFull Text:PDF
GTID:1484306470491854Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Risk assessment on diabetes is the key for prevention and control on diabetes.The research on risk assessment will benefit for reducing complications,improving survival quality and saving healthcare resource.Existing risk assessment models are based on universal methods,these models aussme that the bias has the same variance.However,the data are always heterogeneous,if we can discover the mapping function in the local area,the performance of risk assessment can be improved efficiently.We thus engages in exploring a series of effective tree-based FMM to find homogeneous partitions for input data and parse the input-putput relationship,i.e.,multiobjective semi-supervised FMM and structure-driven based mixture of experts.These methods can overcome the heterogeneity and nonstationary of the data,improve the transparency and robustness of the model.For diabetes risk assessment,We proposed finite mixture model(FMM)to address the complex risk assessment tasks,i.e.,risk stratification,undiagnosed diabetes detection and blood glucose estimation.The main contributions can be summarized as follows:(1)The soft-margin mixture of experts(SMMo E)are proposed to solve the discontinuous and heterogenous data.The SMMo E directly learns homogeneous partitions of the input-output space based on max-margin classifier and weighted least-square model,it is able to contain the error in the margin and minimize the error on local experts.The SMMo E improves the classification and regression accuracy on diabetes detection and fasting blood-glucose estimation tasks.The conventional universal approaches and divide-and-conquer approaches may fail when the data is heterogenous or discontinuous.The SMMo E simultaneously finds homogeneous partitions in the joint input-output space using max-margin classification,and learns a local expert for each partition.SMMo E uses the score of max-margin classifier as the mixing proportions,and minimizes the regression error in the soft margin between partitions rather than hard partitions.Compared with existing mixture of experts,SMMo E uses the hinge loss to model the transition between regressors,it produces ”0” weights within partitions and smooth weights between partitions.This property prevents one regressor from dominating other regressors in estimating final outputs.We applied SMMo E to the undiagnosed diabetes detection on CHNS dataset and NHANES dataset.In all of the problems SMMo E outperformed state-of-the-art results.(2)The eclectic rule extraction from soft-margin mixture of experts is proposed to improve the comprehensive representation of SMMo E.The proposed mehtod utilizes the homogeneous partitions in the joint input-output space learned by SMMo E,and extracts support vectors(SVs)from max-margin classifier.These rich samples are used to generate the rule sets by interpretable reduced random forest(IRRF).The proposed method improves the performance on diabetes detection and fasting blood-glucose estimation tasks.Soft-margin mixutre of experts method utilizes the max-margin classifier as the gate function,however,the max-margin classifier is not comprehensive and transparent.For solving this problem,we proposed eclectic rule extraction from SMMo E based on interpretable reduced random forest(SMMo E+IRRF).We firstly down-sample the data with SVs to retain the rich points,which contains the hyperplane learned by gate function,then utilize reduced random forest to get the a small amount rule sets.Through rule extraction from SMMo E,it generates certainly much less and smaller rules than those in generative rule induction methods,where the large rule sets may make the problem incomprehensible.We applied the proposed method on CHNS dataset for the diabetes detection and fasting bloodglucose estimation tasks,and the SMMoE+IRRF method outperforms state-of-the-art results.(3)For solving the robustness of mixture of regression(Mo R),we proposed a self-paced mixture of regressions(SPMo R)method,which effectively address the intracomponent outlier and the inter-component imbalance problem of the existing Mo Rs.SPMo R constructs the new self-paced regularizer based on Exclusive LASSO,it encourages intra-group competition to make each component sparsity,but discourages inter-group competition to make the group diversity.This property can eliminate the outliers,and improves the accuracy on glucose estimation task.We prosed self-paced mixture of regressions to solve the robustness of mixture of linear regressions.We make the earliest effort on Self-paced Learning(SPL)in Mo R.Compared with existing robust Mo R approaches,the SPL pursues confidence sample reasoning,such that SPMo R effectively reduces regression error caused by outliers.We propose a novel selfpaced regularizer based on the Exclusive LASSO,which improves inter-component balance of training data.On one hand,the encouraged intra-group competition will prevent the learner from using the outlier data within each component.One the other hand,the discouraged inter-group competition will induce the learner to select balanced training data from different components.It is flexible to expand the model to hierarchical mixture of experts.To demonstrate the effectiveness of SPMo R,we conducted experiments on both the sythetic examples and real-world applications to glucose estimation.The results show that SPMo R outperforms the state-of-the-arts methods.(4)For solvoing the problem that the category for labeled data was incomplete,we proposed a pairwise and size constrained clustering model(PSCC)to intergrate the multi-target supervised information,and define the semi-supervised clustering based on Gaussian mixture model as multi-objective optimization problem to balance the different types of side information.The proposed method can generates the balanced clustering results and improves the clustering accuracy on risk stratification tasks.When the category for labeled data was incomplete,where incomplete means that the labeled data cannot cover all the categories,the semi-supervised clustering may generate empty cluster or cluster with few points.Hence,we propose a pairwise and size constrained clustering method,which unified the multiple forms of constraints,i.e.,pairwise constraints or size constraints,into Gaussian mixutre model(GMM).The PSCC method deduces the cost function from GMM,and penalizes the distribution of each component based on quality information of constraints.Thus the solution of PSCGMM will be a weighted trade-off between MSE and cluster sizes.Meanwhile,a weighted KKZ method based on multiple Gaussian distribution was proposed to initialize the centroids.It will eliminate the impact of outliers where traditional KKZ method may choose the outliers as the initial centroids.Results on multiple datasets show that proposed PSCC method can get a feasible and balanced stratification solution to avoid cluster with few points,and outperforms the other semi-supervised clustering methods.
Keywords/Search Tags:Finite Mixture Model, Mixture of Experts, Diabetes Risk Assessment, Soft-margin Classifier, Rule Extraction, Self-paced learning, Semi-supervised Clustering
PDF Full Text Request
Related items