Font Size: a A A

The Study Of Several Key Issues On Large Data Sets Classification Techniques In Pattern Recognition

Posted on:2013-03-05Degree:DoctorType:Dissertation
Country:ChinaCandidate:W J HuFull Text:PDF
GTID:1228330395464904Subject:Light Industry Information Technology and Engineering
Abstract/Summary:PDF Full Text Request
Pattern recognition (PR) is an important research task in machine learning and classificationis a fundamental topic in pattern recognition. Up to now, the classification methods for largedatasets have been further studied and many obtained technologies have been successfully andwidly applied in practical production. However, there are many issues in classification task to bedeeply explored and extensively studied, e.g. learning effieciency, decision effeiciency and privacypreservation. Therefore, we focus on large datasets and address the above issues in our study. Themain contributions include:(1)Discuss classification methods in view of the classification margin and speed up theirlearning process. A new classification method, called Maximum Vector-Angular MarginClassifier (MAMC) is proposed, which is based on a new concept of margin calledvector-angular margin. The kernelized MAMC can be equivalently formulated as the CenterConstrained Minimum Enclosing Ball (CC-MEB), thus MAMC can be extended toMaximum Vector-Angular Margin Core Vector Machine (MAM-CVM) by introducing CoreVector Machine (CVM) method to fast train large datasets. Besides, we construct theclassification margin by using the difference of densities (DoD), then a classification methodis presented, which is called Maximum Margin Logistic Vector Machine (MMLVM). Thegiven generalization error bound guarantee that MMLVM will obtain the better performancefor the larger data sets. The experimental results on the real-world data sets validate theeffectivety of the above methods.(2)Discuss the privacy problem of classification methods and present some learning methodswhich can not only preserve privacy and improve testing speed. It is proved that Gaussiankernel density estimate with Integrated Squared Error (ISE) criterion can be equivalent tothe minimum enclosing ball (MEB). Based the result, a new learning method of MEB withprivacy cloud data called Privacy Cloud Calibration MEB (PCC-MEB) is proposed. Then,PCC-MEB is extended to Fuzzy Privacy Cloud Calibration MEB (FPCC-MEB) byintroducing a fuzzy membership function to resolve unclassifiable zones among classes.Besides, for improving the decision speed of one-class SVDD and decreasing the risk ofmodel privacy violation, a fast decision approach called FDA-SVDD is proposed in thispaper by utilizing the preimage in original feature space corresponding to the center ofsphere in kernel feature space. As a result, the decision complexity of SVDD is reducedfrom O(n) to O(1) and the model privacy can be preserved.(3)The nonlinear capability of Linear Support Vectore Machine (LSVM) is addressed and a newmodel called Fast Model of Ensembling LSVMs (FMELSVM) is developed. Although the algorithm of LSVM is simple, efficient in training and testing speeds, and with thecapability of preserving the model privacy, it can not be applied for nonlinear datasets. Toaddress this issue, we design FMELSVM based on LSVM. FMELSVM ultilizes thecombination of the nonlinear Radical Basis Functions (RBFs) for fitting a nonlinear decisionfunction. And it can be solved efficiently by the gradient descent method to maximize a loglikelihood function which is the cross-entropy error of training data. The experimentalresults show that the nonlinear capability of LSVM is improved and the training anddecision speed is also boosted.(4)Realize the fast learning of the generalized hypeshpere. We stress on the generalizedsoft-margin MEB models and propose a fast learning approach called Fast Learning ofGeneralized MEB (FL-GMEB). Due to the change of the inequality constraint in thegeneralized MEB, it can not be considered as a MEB problem. Accordingly, we can notconveniently use CVM to train the generalized MEB for large datasets. To address thisissue, FL-GMEB slightly relaxes the constraints in the generalized MEB such that it can beequivalent to the corresponding CC-MEB, which can be solved with the correspondingCore Set (CS) by CVM. Then, FL-GMEB attempts to obtain the extended core set (ECS)by expanding neighbors of some samples in CS into ECS in terms of the inverse concept ofLocally Linear Embedding (LLE). Finally, FL-GMEB takes the optimized weights of ECSas the approximate solution of the generalized MEB. As a result, the FL-GMEB methodobtains a soft hypershpere with the capability of preserving the structure of the localboundary, which results in the robust with the outliers around the boundary.
Keywords/Search Tags:Kernel Method, Support vector machine(SVM), Support vector data description(SVDD), Core set vector machine(CVM), Minimum enclosing bal(lMEB), Center constrainedMEB, Large datasets, Privacy preservation
PDF Full Text Request
Related items