Font Size: a A A

Research On The Classification Method And Application Of Mahalanobis-Taguchi System For Imbalanced Data

Posted on:2020-03-08Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y P GuFull Text:PDF
GTID:1368330602961084Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
As one of the research hotspots in data mining,classification technology is applied in many industries.Some existing classification methods are generally based on balanced training samples,so satisfactory classification results can be archived under the condition of the balanced data.However,the assumption of data balance is usually not true in many practical problems such as credit evaluation,fault diagnosis,intrusion monitoring and so on.In these cases,there usually exists imbalance data sets between different categories and accompanies with the phenomenon of class overlap,noise interference,etc.Therefore,it has strong theoretical significance and practical value to solve the classification problem for imbalance data.The Mahalanobis-Taguchi System(MTS)is a quantitative pattern recognition method for classification,diagnosis,and prediction of multivariate data.MTS has many advantages,e.g.,MTS is a data-based analysis method and can really reduce the number of variables,which can simplify problems and improve the accuracy and efficiency of classification.MTS builds a continuous measurement scale and calculates the deviation degree of test sample from the reference space,which is conducive to adopt corresponding solutions and improve the flexibility of problem solving.However,there are still some deficiencies in the theory aad application of MTS.In this paper,the methods of multivariate control chart,chaotic binary particle swarm optimization algorithm,kermel function,AdaBoost algoritrun are used to improve MTS and make it be suitable for imbalanced data classification.The main research work of this paper is shown as followns:(1)Research on the sample optimization of MTS reference space for imbalanced dataWhen MTS is used to classify for imbalanced data,the majority class,such as healthy people in disease diagnosis,is used to establish the reference space.In view of the fact that there may be outliers or noise mixing in the reference space determined by traditional MTS based on professional knowledge or historical experience,all samples in reference space are analyzed by using the method of multivariate control chart to ensure the effectiveness of the MTS method from the source.The UCI data set are used to conduct feasibility analysis and the results show that the MTS classification performance is improved after multivariate control charts are used to optimize the reference space samples.(2)Research on the variable optimization of MTS reference space for imbalanced dataOrthogonal array and signal-to-noise ratio(SNR)were used to optimize the reference space in traditional MTS.However,it has been proved that the orthogonal array method is not the best strategy to select the optimal subset of variables.In this paper,the classification effect of imbalanced data and the capability of dimensionality reduction are considered comprehensively,and the chaos binary particle swarm optimization algorithm is used to establish a variables optimization model of MTS reference space,the classification error rate with smaller-the-better characteristic and the dimensionality reduction with larger-the-better characteristie are used as the optimized goals,the normal samples,abnormal samples,and feature variables are used as optimization objects,and the constraints such as optimization object types and the range of values are analyzed to establish a optimization model of variables in the refenerce space of MTS.In order to verify the classification ability and the effect of this model,it is compared with other commonly used classification methods by using the commom classification data sets.The results show that the MTS with optimization algorithm not only has better classification effect on imbalanced data,but also has better dimensionality reduction,so it can be applied to the classification problem of imbalanced data.(3)Research on the measurement scale improvement of MTS for imbalanced dataThe classification principle of MTS is to compare the mahalanobis distance of samples with the threshold to determine which class the samples should belong to.If there exists class overlapping,the discrimination ability of mahalanobis distance will be worse.Therefore,MTS is mainly applied to the field under linear separable data situation,and can achieve good results.However,in the case of linear inseparable data,the classification effect of MTS is not good,and the misclassification rate for the minority class will be higher,especially for imbalanced data.In view of this situation,the ideas of support vector machines,kernel fisher discriminant analysis and other algorithms are used,and in MTS the kernel function is introduced which is combined with the mahalanobis distance to form the kernel mahalanobis distance as a new measurement scale.Through the implicit nonlinear mapping of the kernel function,the input data can be mapped into the high-dimensional feature space and then linear classification can be performed.In such way,the class overlapping problem can be well handled by MTS.Finally,this method is applied to the empirical study of fault diagnosis of anti-interference signal acquisition equipment.The results show that this method can deal with the overlapping problems well and has a good application effect.(4)Research on MTS classification with AdaBoost ensemble algorithm for imbalanced dataThe ensemble algorithm can stabilize classification results and achieve higher accuracy.Currently,the two most widely used ensemble algorithms are Bagging and AdaBoost.Although AdaBoost algorithm is slightly more complicated than the Bagging,the classification effect of AdaBoost is better,especially in the case of data imbalance.Aiming at the nature of imbalanced data itself,optimized MTS method is used as the base classifier and integrated with AdaBoost algorithm,and multiple evaluation indexes are used to perform experimental analysis on the reference data set.In the case study of the Chinese listed company's financial crisis warning,the improved algoritlun is applied,and because of the class overlapping of financial data,the kernel mahalanobis distance is used as its measurement scale.Compared with traditional MTS,optimized MTS,and other common single classifiers,the research shows that the classification performance and dimensionality reduction of ensemble method are better,and the results are more stable.In this paper,the problem of imbalanced data classification is taken as the research object,improvement of MTS is taken as the main line,and the theory of optimization algorithm and kernel function is taken as the main methods.The goal is to develop MTS to be efficient classification method that suitable for imbalance data and apply it to solve practical problems.
Keywords/Search Tags:Mahalanobis-Taguchi System, Imbalanced data, Multivariate control chart, Chaos binary particle swarm optimization, Kernel function, AdaBoost ensemble algorithm
PDF Full Text Request
Related items