Contirbutions To Classification And Clustering Methods Based On Parzen Window Density Estimation

Posted on:2014-11-13

Degree:Doctor

Type:Dissertation

Country:China

Candidate:W H Ying

Full Text:PDF

GTID:1268330401955050

Subject:Light Industry Information Technology and Engineering

Abstract/Summary:

Pattern recognition (PR) is an important research task in artificial intelligence (AI),classification and clustering are two fundamental topics in pattern recognition. Due to thepatternâ€™s diversity int this objective world, it is impossible to find a common method to solvethe classification and clustering problems, which brings a lot of challenges and opportunitiesto the classification and clustering method. Parzen Window density estimation, as arepresentative of density estimation methods, has also been widespread concern in recentyears. If the data distributions of some problems can be estimated accurately, it will assistrecognition task greatly.Based on Parzen Window density estimation, some research focuses such asclassification and clustering on large scale datasets, and on unbalanced data sets, domainadaptation etc. are studied. The natural relationships between these algorithms and ParzenWindow density estimation are also be revealed. In this paper, the creative research resultsare:(1) This paper proposes a kernel classification algorithm (MDL2KC) which is based onthe theory of maximum difference of densities and Parzen Window density estimation method.MDL2KC not only ensures the estimate difference of densities fairly close to the truedifference of densities but also can maximize the difference of densities between two classes.So it can further improve the classification effect.(2) In this paper, a new linear representation of Similarity between sample and class isdefined. Meanwhile, the representation has been proved that it had the essence of ParzenWindow density estimation. Based on this representation, a new classification method calleddifference of similarity support vector machine(DSSVM) is proposed, DSSVM pursues a bestlinear representation of a total similarity between any sample and a particular class. Accordingto the sparsity of the linear representation and the max margin of the difference of similarity, anew optimization problem is obtained. Additionally, the difference of similarity support vectormachine can be equivalently formulated as the center constrained minimal enclosing ball, andthus difference of similarity support vector machine can be extended to difference ofsimilarity core vector machine(DSCVM) by introducing fast learning theory of minimalenclosing ball, to solve the classification for large datasets.(3) Based on local synchronization phenomena of the dynamics, a novel fast adaptiveclustering algorithm FAKCS is proposed in this paper. Firstly, FAKCS introduce a novelmethod which is based on RSDE and CCMEB technology to extract the samples from theoriginal data set. Then it begins clustering adaptively by using the Davies-Bouldin clustercriterion and the new order parameter which can observe the degree of local synchronization.Otherwise, the relationship between the new order parameter and Parzen Window densityestimation is found in this paper, which reveals the probability density nature of localsynchronization.(4) A clustering algorithm named Orthogonal Fuzzy k-Plane Clustering (OFkPC) ispresented here by introducing orthogonal restriction into Fuzzy k-Plane Clustering (FKPC).Just like kPC and FKPC, OFkPC still uses k group hyperplanes as the prototypes of cluster centers. According to the idea behind kPC and FKPC, these hyperplanes are built todistinguish samples in different classes. So the matrices constructed using the normal vectorsof these hyperplanes can be used to reduce dimensionality.(5) A projected maximum divergence discrepancy distance measure is proposed in thispaper. Based on the structural risk minimization theory and the projected maximumdivergence discrepancy distance measure, the support vector machine based on difference ofdivergence(DSSVM) is also proposed. The proposed approach can keep the confidence riskunder control on target domain, which could improve the algorithmâ€™s generalization ability.(6) To solve the domain adaptation problem on unbalanced datasets, this paper employsthe proposed distribution discrepancy PMMDDMCD which considers the sampleâ€™s labelinformation in the source domain and then propose a novel domain adaptation learningmethod based on the structure risk minimization principle, called support vector machine fordomain adaptation based on class distribution (CDASVM). Accordingly, CDASVM isextended to MSCDASVM which can be used to deal with the domain adaptation problemfrom multiple sources. By Theoretical derivation, the PMMDDMCD has been proved that itreflected a distribution discrepancy between domains which is estimated by Parzen Windowdensity estimation method. So it can measure the distribution discrepancy on unbalanceddatasets more effectively because of containing class information.

Keywords/Search Tags:

Parzen Window density estimation, Core set vector machine, Unbalanceddataset, Synchronization, Domain adaptation, Minimum enclosing ball, Clustering, Classification

Related items

1	A Study On Domain Adaptation Algorithm And Its Application
2	Study On Clustering For Large Data Sets And Its Applications
3	A Study Of Support Vector Classifiers Based On The Theory Of Minimum Enclosing Ball
4	Research On Fast Training Method Base On Core Vector Machine And Support Vector Machine
5	A Study On Cross-domain Classification And Its Application
6	Study On Training And Simplifying Algorithms Of Support Vector Classification
7	Research And Application Of Discriminative Dictionary Learning Algorithms Based On Data Representation
8	Regularized Angular Margin Core Vector Machine
9	A Study On Classification Method Based On Integrated Utilization Of The Labeled And/or The Unlabeled Data
10	The Research On RBF-ELM Two-phase Learning Algorithm