Font Size: a A A

Research On Classifiers Interpretability And Application In Disease Aided Diagnosis

Posted on:2018-05-08Degree:MasterType:Thesis
Country:ChinaCandidate:Y MaFull Text:PDF
GTID:2404330542477041Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
In machine learning,most of classifiers have good classification performance,but the models are usually“black boxes" that make it difficult for users to interpret the reasons behind prediction.In the past,how to improve the accuracy of the classifier model has been widely researched,but the research on the interpretability of classifier still to be further studied.With the practical application requirements,people gradually pay attention to computer-aided diagnosis,which not only requires a high accuracy,but also need classification interpretability urgently.In this paper,with the support of the National Natural Science Foundation of China(61471124),we focus on research of explaining classifiers and a novel explanation technique that explains the prediction of random forests classifier is proposed,as well as the results of this study is applied in disease aided diagnosis to achieve the interpretability.The specific contents are as follows:Firstly,a classification method combining fandom forests classifier with t-SNE is proposed in this thesis.This thesis focus on the bad interpretability of random forests and a method based on t-SNE to visualize for revealing intrinsic relationships in data that lay inside random forests internal model is proposed.On this basis,we propose a novel classification method via random forests model combined with t-SNE.At first,pairwise similarity measures of the different sample features are derived from random forests classifier.Then,t-SNE manifold learning used to provide low-dimensional spaces.The low-dimensional spaces used as new features input to train random forests model and the test samples by mapping to this space to infer the labels.Finally,the proposed method is applied in fetal heart rate pathological diagnosis.Experimental results show that it can guarantee higher classification accuracy and observe the discrepancy between different types.Secondly,a method for extracting rule from random forests based on decision tree selection and sparse coding has been designed by the author.Random forests is an ensemble classifier consisting of many decision trees,IF-THEN rules can be extracted from each trees.Compared with a single decision tree,random forests is composed of a large number of rules,and their interpretation is poor.At first,in order to improve the interpretation of random forests classification,decision tree subsets in random forests are selected by the method of sequential backward selection,which can guarantee accuracy.Then,the sparse coding method used to extract the sparse rules from the decision tree subsets.At last,the proposed method is applied to the diagnosis of actual fetal heart rate sickness,and different types of fetal heart rate data are obtained from the hospitals,followed by signal de-noising,feature extraction,model training and results interpretation.Experimental results show that the proposed method has only three classification rules and the accuracy rate is more than 90%,it can trade-off between the prediction accuracy and model interpretability,and the design is more in line with the requirements of disease diagnosis.To summarize,due to random forests classifier is lack of interpretability,a new solution to the problem is proposed in this thesis.In the diagnosis of fetal heart rate,results show that the accuracy and interpretability of classification both can be ensured,it lays a foundation for the design of a disease diagnosis system which can be easily understood by people in future.
Keywords/Search Tags:computer-aided diagnosis, interpretability, random forests, t-SNE, sparse coding
PDF Full Text Request
Related items