Font Size: a A A

Semi-Supervised Deep Fuzzy C-Mean Clustering And Classification

Posted on:2020-06-30Degree:DoctorType:Dissertation
Country:ChinaCandidate:Ali ArshadFull Text:PDF
GTID:1368330602963875Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Semi-supervised learning has been successfully applied in research fields such as data mining and machine learning for dynamic data analysis.Imbalance class learning is one of the most challenging issues for classification.Other than imbalance dataset the redundant and irrelevant features that are used to train model can hinder the performance of a classification model.In recent years,the core focus of numerous researchers has been on binary and multi-class imbalanced classification.The key points of our study are as follows.1.A semi-supervised Deep Fuzzy C-Mean(DFCM)clustering is proposed for binary and multi-class imbalanced data classification,it can be applied when data boundaries are not clearly defined and extra parameters are needed to reduce the statistical closeness.The proposed approach is basically a two-stage data pre-processing technique for the classification model.2.A novel semi-supervised Deep Fuzzy C-Mean(DFCM)clustering based feature extraction technique is proposed to create new features by utilizing Deep multi-clusters of supervised and unsupervised datasets that tend to maximize intra-cluster classes and intra-cluster features by using Fuzzy C-Mean(FCM)clustering.We also apply decomposition strategy on multi-clusters to extend our approach for the multi-class imbalanced dataset,which associates one-vs.-one the maximum similarity between intra-cluster classes and intra-cluster features.Fuzzy C-Mean is utilized to handle the overlapping problem.3.A feature selection method is proposed combined with feature selection and re-sampling(random-under sampling and random-over sampling)to reduce the noisy data with handle the imbalance problem for classification.Finally,the classification model is predicated on the maximum homogenous between the features of labeled and unlabeled data.However,by the performance of model results in the amalgamation of novel DFCM data pre-processing approach work better due to their ability to identify and amalgamation essential information in data features.4.To check the efficiency of our proposed approach,we choose datasets from real-world software projects(NASA&Eclipse)for binary-class imbalanced data classification and 18 UCI benchmark datasets for multi-class imbalanced data classification,and then we compared our approach with state-of-the-art binary-class and multi-class imbalance learning algorithms by using performance measure(Pd,Accuracy,F-Measure,and Area Under the Curve(AUC))and further investigated the influencing factors in our approach.The result shows that the performance of the proposed Deep Fuzzy C-Mean(DFCM)technique is stable and effectiveness on all types of binary and multi-class imbalanced datasets.
Keywords/Search Tags:Semi-supervised Learning, Deep Fuzzy C-Mean clustering, Feature Learning, Imbalanced dataset, Multi-class Classification
PDF Full Text Request
Related items