Font Size: a A A

The Study Of Classification Methods And Its Applications In Web Mining Based On Statistical Learning

Posted on:2013-09-09Degree:DoctorType:Dissertation
Country:ChinaCandidate:J W TaoFull Text:PDF
GTID:1228330395964897Subject:Light Industry Information Technology and Engineering
Abstract/Summary:PDF Full Text Request
Recently, pattern recognition abased on statistical learning theory is an important study field inmachine learning and deeply studied. And some relevant technology with pattern recognition hasbeen successfully applied in many fields. However, pattern recognition still confronts manychallenges under the development of statistical learning theory, and many issues need to be moredeeply explored and further study in some specific application domain such as Web data mining.Robust Feature dimension reduction based on manifold learning, data-dependent SVM learning,and domain transfers learning are three important topics of them. Motivated by the abovechallenges, several issues are addressed in this study, which mainly involves three parts as follows.In the first part which is composed of Chapter2, aiming at the drawback of Locally LinearEmbedding (LLE) algorithm, which is sensitive to noise or outlier, a novel L1-norm based LLE(L1-LLE) algorithm is proposed in this study, which is robust to outliers because it utilizes theL1-norm, which is less sensitive to outliers. The proposed L1-norm optimization technique isintuitive, simple, and easy to implement. It is also proven to find a globally minimal solution. Theproposed method is applied to several data sets and the performances are compared with those ofother conventional methods.In the second part which is composed of Chapter3, Chapter4, Chapter5, and Chapter6, how toimprove the performance of support vector machine by simultaneously considering between-classmargin and within-class cohesion is discussed. In Chapter3, a novel maximal margin support vectormachine with magnetic field effect (MFSVM) is proposed in allusion to pattern classificationproblem. In the Mercer induced feature space, MFSVM can effectively resolve one-class/binaryclassification problems. By introducing a minimum q-magnetic field tube, the basic idea ofMFSVM is to find an optimal hyper-plane with magnetic field effect such that one class (or normalpatterns) can be enclosed in the q-magnetic field tube due to the magnetic attractive effect, whileat the same time the margin between the q-magnetic field tube and the other class (or abnormalpatterns) is as large as possible due to magnetic repulsion, thus implementing both maximumbetween-class margin and minimum within-class volume so as to improve the generalizationcapability of the proposed method. In Chapter4, aiming at the drawbacks of the state-of-the-artpattern classifiers, which can not efficiently preserve the local geometrical structure or the diversity(or discriminative) information of data points embedded in high-dimensional data space, which isuseful for pattern recognition, a novel so-called Locality-Preserved Maximum Information Variancev-Support Vector Machine (v-LPMIVSVM) algorithm is presented based on manifold learning toaddress those problems mentioned above. The v-LPMIVSVM introduces within-localityhomogeneous scatter, which measures the within-locality manifold information of data points, andwithin-locality heterogeneous scatter, which measures the within-locality diversity information ofdata points, thus an optimal classifier with optimal projection weight vector by minimizing thewithin-locality homogeneous scatter and simultaneously maximizing the within-localityheterogeneous scatter. Meanwhile, v-LPMIVSVM adopts geodesic distance metric to measure thedistance between data in the manifold space, only which can reflect the true geometry of themanifold. In addition, an additional parameter is introduced to control both the super bound on thefraction of margin errors and the lower bound on the fraction of support vectors, thus improving thegeneralization capacity of the proposed method v-LPMIVSVM. In Chapter5, inspired by thesupport vector machines for classification and the small sphere and large margin method, wepresent a novel large margin minimal reduced enclosing ball learning machine (LMMREB) forpattern classification to improving the classification performance of gap-tolerant classifiers withconstructing a minimal enclosing hyper-sphere separating data with the maximum margin andminimum enclosing volume in the Mercer induced feature space. The basic idea is to find two optimal minimal reduced enclosing balls by adjusting a reduced factor parameter q such that each ofbinary classes is enclosed by them respectively and the margin between one class pattern and thereduced enclosing ball is maximized, thus implementing both maximum between-class margin andminimum within-class volume. In Chapter6, to deal with several problems occurring in classicalsupport vector machines such as over-fitting problem, which resulted from the outlier and classimbalance learning, and loss of the statistics information of training examples, we present a novelclassifier called total margin based fuzzy hyper-sphere learning machine (TMF-SSLM) byconstructing a minimum hyper-sphere in Mercer kernel-induced feature space so that one class ofbinary patterns is enclosed in the hyper-sphere while another one is separated away from thehyper-sphere with maximum margin, thus implementing both maximum between-class margin andminimum within-class volume. TMF-SSLM solves not only the over-fitting problem resulted fromoutliers with the approaches of fuzzification of the penalty and total margin algorithm, but also theimbalanced datasets by using different cost algorithm, thus obtaining a lower generalization errorbound.In the third part which is composed of Chapter7and Chapter8, some issues about domainadaptation learning are deeply explored. In Chapter7, in allusion to the existing drawback ofpresent domain adaptation learning methods, which may not work well when only to minimize thedistribution mean discrepancy between source domain and target domain is considered, we design anovel domain adaptation learning method based on structure risk minimization model, calledDAKSVM (Kernel support vector machine for domain adaptation) with respect to support vectormachine (SVM) and LSDAKSVM with respect to least-square SVM (LS-SVM) respectively, toeffectively minimize both the distribution mean discrepancy and the distribution scatter discrepancybetween source domain and target domain in some reproduced kernel Hilbert space, which is thenused to improve the classification performance. In Chapter8, aiming at some problems in domainadaptation learning, we propose a novel so-called Kernel Distribution Consistency based LocalDomain Adaptation Classifier (KDC-LDAC). Firstly, in some universal reproduced kernel Hilbertspace (URKHS), KDC-LDAC trained a kernel distribution consistency regularized domainadaptation Support Vector Machine (SVM) based on structure risk minimization model. Andsecondly, according to the idea of local learning, the proposed method predicted the label of eachdata point in target domain based on its neighbors and their labels in the URKHS. The last but notleast, KDC-LDAC learning a discriminant function to classify the unseen data in target domain withtraining data well predicted in local learning procedure.At last, we make the conclusions about our overall study works in Chapter9.
Keywords/Search Tags:Feature dimention reduction, Manifold learning, Locally Linear Embedding, Locality preservingprojection, Support vector machine, Kernel learning, Fuzzy support vector machine, Margin based Support vectormachine, gap-tolerant support vector machine
PDF Full Text Request
Related items