New Research On Interpretable Fuzzy Clustering And Classification Methods

Posted on:2023-08-04

Degree:Doctor

Type:Dissertation

Country:China

Candidate:Z K Bian

Full Text:PDF

GTID:1520306794960469

Subject:Light Industry Information Technology

Abstract/Summary:

PDF Full Text Request

Recently,with the development of machine learning methods and fuzzy theory,fuzzy clustering and fuzzy classification methods have obtained more and more attentions.Compared to the traditional clustering and classification methods,fuzzy clustering and fuzzy classification methods own the ambigusity and uncertainty handling capability.Besides,because of introducing the fuzzy theory in the fuzzy clustering and fuzzy classifiaciton methods,the interpretabilities(e.g.,fuzzy linguistic-based interpretability)will be given and the practicalities will be improved.Therefore,we focus on fuzzy clustering and fuzzy classification methods,i.e.,interpretable fuzzy clustering methods,the traditional fuzzy classification methods and interpretable fuzzy classification methods.And the research work can be divided into the following two main aspects:(1)For the fuzzy clustering method,the classical density peaks clustering method(DPC)is fuzzified by introducing fuzzy operators,which given its fuzzy interpretability and improves the clustering performance.(2)For the fuzzy classification methods,three typical fuzzy classification methods are selected: i.e.,Fuzzy K nearest neighbors classification method(FKNN),Fuzzy random forest(FRF)and TSK-based(Takagi-Sugeno-Kang-based)fuzzy ensemble methods.Among them,we focus on improving the performance of the FKNN;for the FRF and TSK-based fuzzy ensemble methods,on the basic of their linguistic-based interpretabilities,we attempt to give new interpretabilities or improve their linguistic-based interpretabilities,simultaneously enhance their generalization capabilities and improve their running speeds.Specially,the research work can be stated separately as follows.(1).The clustering performance of density peaks clustering method(DPC)depends heavily on the calculation method of kernel-based density peak,which leads to the following two problems:1)whether the calculation method of kernel-based density peak can effectively handle the fuzzy and uncertain data points in the dataset;2)Whether the concept of density peak can be explained and redefined from the perspective of soft division(i.e.,fuzzy division),so as to enhance the clustering performance.To solve the above two problems,the calculation method of fuzzy density peak is firstly proposed so as to improve the ambiguity and uncertainty handling capability and the flexibility of clustering method.The fuzzy density peak can be calculated by using fuzzy operator(i.e.,S-norm operator)to couple the fuzzy membership between data points and their neighbors.Then,based on fuzzy density peak and the framework of DPC clustering method,fuzzy density peaks clustering method(FDPC)is proposed.By adjusting appropriate fuzzy parameters,FDPC not only improves its clustering performance,but also improves its flexibility.The experimental results show that FDPC has better clustering performance and interpretability by selecting appropriate fuzzy parameters for fuzzy operators in most cases.(2).The traditional fuzzy k-nearest neighbor classification method(FKNN)classifies testing samples by setting the same value for all test samples,which will seriously weaken the classification performance of FKNN classification method.Therefore,we mainly discuss the feasibility of setting different k values for different testing samples,and then propose a new classification method,called FKNN with adaptive nearest neighbors(A-FKNN).A-FKNN can learn a unique optimal k value for each testing sample.In the training stage,A-FKNN first uses the sparse reconstruction model to self represent all training samples,then learns the optimal k value for each training sample and then takes it as the new label of the corresponding training sample.Finally,a decision tree(i.e.A-FKNN tree)is generated based on all training samples and their corresponding new labels,in which each leaf node stores the corresponding optimal k value.In the testing stage,A-FKNN determines the optimal k value of each testing sample by searching the generated A-FKNN tree,and then adopt FKNN to predict the testing sample.In addition,in order to improve the testing speed of A-FKNN,a fast version of A-FKNN,called fast A-FKNN(FA-FKNN)is proposed.Compared to A-FKNN,FA-FKNN generates a fast A-FKNN tree(FA-FKNN tree)in the training stage,in which each leaf node stores the optimal k value and a subset of training samples.The experimental results show that both A-FKNN and FA-FKNN outperform the comparative methods in sense of the testing accuracy,and FA-FKNN runs faster than A-FKNN in testing satge.(3).In order to improve the generalization performance and running speed of fuzzy random forest method(FRF)on high-dimensional datasets,an enhanced fuzzy random forest method by using double randomness and copying from dynamic dictionary attributes(E-FRF)is proposed.In addition to retaining the original randomness of FRF,E-FRF attempts to introduce double randomness into the generation of both the candidate features and the best splitting features of each fuzzy decision tree,so as to increase the generalization performance.In addition,in order to avoid calculating the new fuzzy information gain(NFG)of all candidate features,some NFG values can be quickly obtained by copying from the dynamically generated dictionary.Moreover,the consistency of E-FRF is proved theoretically.The experimental results show that E-FRF keeps at least comparable generalization capability to the comparative methods on most high-dimensional datasets,and it outperforms FRF in sense of testing accuracy and running speed.(4).Motivated by both the commonly-used “from wholly coarse to locally fine” cognitive behavior and the recent finding that simple yet interpretable linear model should be a basic component of a classifier,a novel hybrid ensemble classifier called H-TSK-FC(Hybrid TSK fuzzy classifier)and its residual sketch learning method(RSL)are proposed.H-TSK-FC essentially shares the virtues of both deep and wide interpretable fuzzy classifiers and simultaneously has both feature-importance-based and linguistic-based interpretabilities.The core of residual sketch learning method RSL includes the followings: 1).A global linear sub-classifier on all original features of all training samples is generated quickly by the proposed sparse representation based linear sub-classifier training procedure to identify/understand the importance of each feature and partition the output residuals of the incorrectly classified training samples into several residual sketches.2).By using both the enhanced soft subspace clustering method(ESSC)for the linguistically interpretable antecedents of fuzzy rules and the least learning machine(LLM)for the consequents of fuzzy rules on residual sketches,several interpretable TSK fuzzy sub-classifiers are stacked in parallel through residual sketches and accordingly generated to achieve local refinements.3).The final predictions are made to further enhance H-TSK-FC’s generalization capability by taking the proposed nearest-label voting for all the outputs of all the constructed sub-classifiers in testing stage.In contrast to existing deep stacked or wide interpretable TSK fuzzy classifiers,except for an extra feature-importance-based interpretability,our extensive experimental results indicate that H-TSK-FC indeed guarantees enhanced or at least comparable generalization capability,faster running speed and high linguistic interpretability(i.e.,fewer rules and/or TSK fuzzy sub-classifiers)of the resultant ensemble classifier.

Keywords/Search Tags:

Interpretable fuzzy clustering methods, the traditional fuzzy classification methods, interpretable fuzzy classification methods, generalization capability

PDF Full Text Request

Related items

1	Extended Classification Methods And Their Interpretability Based On Axiomatic Fuzzy Sets
2	Three Kinds Of Methods For Solving Boundary Value Problems Of Fuzzy Differential Equations
3	PolSAR Classification Based On Robust Kernel Fuzzy C-means Clustering
4	Several Kinds Of Interpolation Methods Based On LR Fuzzy Numbers And Their Error Analysis
5	The New Methods About Fuzzy Linear Programming Problem
6	Theoretical Research Into Bases Of Fuzzy Domains And Its Generalization
7	The Fuzzy Classification Evaluation Research Based On Clustering Validity Index
8	Two Ranking Methods About Fuzzy Number And Its Application
9	Research On Classification Algorithm Based On Fuzzy Rules
10	Methods Research On Reservior Classification Of Chang-8Reservior In Zhenbei Area