Font Size: a A A

Collaborative Classification Based On Statistical Learning And Its Application To Privacy-preserving

Posted on:2012-08-23Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z C ZhangFull Text:PDF
GTID:1118330368489482Subject:Light Industry Information Technology and Engineering
Abstract/Summary:PDF Full Text Request
Pattern classification often needs to handle various kinds of patterns, which means thatit is di?cult to build an e?ective classifier. There are both local and global features forevery matter and there are connections and di?erences between the local and the global fortheir accessibility, availability and accuracy. Recently, how to collaboratively utilize bothlocal and global information is a important research topic.On the other hand, pattern classification task often faces with those data which aregenerally distributed in di?erent parties. Traditional classifiers deal with the data underthe assumption that all parties'data can be free accessed and centralized at the data center.Nowadays, privacy concerns may prevent the parties from directly sharing those data. Howto e?ectively train and classify without disclosure of privacy has became an active researchtopic.Motivated by the two topics, collaboratively local and global learning and its applica-tions in privacy-preserving are studied, the whole thesis involves the following three part.Firstly, on the study of local and global learning, three classification machines areproposed, i.e., (a)A novel large margin classifier called the Collaborative ClassificationMachine with Local and Global Information(C2M) that inspired from covariance matrixstating data direction globally. This model divides the whole global data into two indepen-dent models, and the final decision boundary is obtained by collaboratively combining thetwo hyperplanes learned from the two independent models. The proposed C2M model canbe individually solved as a Quadratic Programming(QP) problem. For a training datasetwith N samples, the total training time complexity is O(N3) which is faster than O(N4) ofthe existing Maxi-Min Margin Machine(M4). We also provide the geometrical interpreta-tion. We show that C2M can robustly utilize the global information from those data setswith overlapping class margins, where M4 loses the global information. We also exploitkernelization trick and extend C2M to its nonlinear version. Moreover, we show that C2Mcan be transformed into standard Support Vector Machine(SVM) model and can be solvedby other speed-up algorithm widely used by SVM. We also propose four indices to nu-merically evaluate the global covariance matrix's contribution to a classifier. (b)For handlethe classification task where there are plenty of normal examples and very few abnormalexamples, a Covariance Preserving Classifier for Novelty Detection (CP-ND) classifier isproposed, in this model, the covariance of normal examples is applied to preserve the sta- tistical distribution of normal data and the margin between the decision hyperplane andabnormal points is maximized, also, the dual problem of this model can be solved as a QPproblem. The three parameters ofν,ν1 andν2 introduced by this classifier can be usedto tune the rate of training misclassification and the rate of support vectors. (c)Inspiredfrom the typical local and global learning machine Maxi-Min Margin Machine (M4) andthe idea of the Locality Preserving Projections (LPP), we propose a novel large marginclassifier, the Generalized Locality Preserving Maxi-Min Margin Machine (GLPM), wherethe within-class matrices are constructed using the labeled training points in a supervisedway, and then used to build the classifier. The within-class matrices of GLPM preserve theintra-class manifold in the training sets, as well as the covariance matrices which indicatethe global projection direction in the M4 model. Moreover, the connections among GLPM,M4 and LDA are theoretically analyzed.Secondly, we focus on the problem of seeding up Support Vectors (SVs) based decisionfunction. The less SVs means the more sparseness and higher classification speed. Basingon the sparsity of SVs, we prove that when clustering original SVs, the minimal upperbound of the error between the original decision function and the fast decision function canbe achieved by K-means clustering the original SVs in input space, then a new algorithmcalled Fast Decision Algorithm of Support Vector Machine(FD-SVM) is proposed, whichemploys K-means to cluster a dense SVs set to a sparse set and the cluster centers are usedas the new SVs, then aiming to minimize the classification gap between SVM and FD-SVM,a Quadratic Programming model is built for obtaining the optimal coe?cients of the newsparse SVs.Finally, inspired from mean value and covariance matrix globally stating data loca-tion and direction, and the fact that sharing those global information with others will notdisclose ones own privacy, we propose a novel two-party privacy-preserving classifica-tion solution called Collaborative Classification Mechanism for Labely Distributed Privacy-preserving(LP2M). This model collaboratively trains the decision boundary from two hyper-planes individually constructed by ones own privacy information and counter-party's globalinformation. We also show that LP2M can be transformed into existing Minimax Probabil-ity Machine(MPM), SVM and M4 model when privacy data satisfies certain conditions.We also proposed secure training and test algorithms. Moreover, for handle more commonhorizontally partitioned data, LP2M is extended to HP2M.
Keywords/Search Tags:Pattern classification, Global and local learning, Collaborative learning, Localpreserving, Fast classification, Privacy preserving
PDF Full Text Request
Related items