Font Size: a A A

Algorithm Study On Non-Parametric Kernel Density Clustering And Feature Extraction

Posted on:2010-01-14Degree:DoctorType:Dissertation
Country:ChinaCandidate:X T YuanFull Text:PDF
GTID:1118360302470950Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
Cluster analysis is one of the fundamental problems associated with dataanalysis. It aims to separate a finite data set into a discrete set of"natural"hidden data structures with little or no prior knowledge. With the rapid devel-opment of computational techniques, cluster analysis has become an attractiveand challenging research topic in a wide variety of fields, ranging from machinelearning, artificial intelligence, pattern recognition and data mining. Feature ex-traction utilizes some transformations to generate useful and novel features fromthe original ones, so that clustering or classification tasks can be eased. This the-sis focuses on the non-parametric kernel density estimation based clustering andfeature extraction methods. We begin by a problem description and backgroundreview of the cluster analysis and feature extraction. We then systemically intro-duce the Mean-Shift (MS) clustering method and some related theories that formthe base of this work. The main contributions of this thesis include two:1. Novel theoretical interpretation and algorithmic extensions of theMS clustering method We give a deep analysis and study of the mathe-matical essence behind MS and its algorithmic extensions. The key contri-butions of this part include:When kernel function is convex, we prove that MS is equivalent to ahalf-quadratic optimization procedure for the kernel density estimator,and thus interprets the elegant numerical performance of MS.Inside the proposed HQ analysis framework, we revisit the quadraticbounding optimization (QBO) nature of MS. Motivated by the QBOperspective of MS, we develop a simple yet e?cient kernel density setcovering algorithm which can cover a given data set with a sparse groupof hyperellipsoids. By iteratively running the proposed set covering al-gorithm on a query set, we obtain an agglomerative non-parametric ker-nel density clustering algorithm called as Agglo-MS, which may signif-icantly accelerate MS clustering procedure. As the extension of Agglo-MS, we develop an incremental non-parametric kernel density cluster-ing algorithm called as IAgglo-MS, and a constrained non-parametrickernel density clustering algorithm called as CAgglo-MS.Furthermore, we extend the classical batch MS into its stochastic gra-dient version along with asymptotic property analysis. Our stochasticgradient Mean-Shift can be applied to e?cient kernel density mode-seeking on large scale data sets or data streams. Extensive comparing experiments on synthetic and real-life datasets vali-date the advantages of the presented algorithmic extensions for Mean-Shiftclustering.2. A robust feature extraction framework based on information theo-retic learning. We present a robust feature extraction framework calledas Renyi's Entropy Discriminant Analysis (REDA). Its formulated objectiveaims at dual targets, motivated by the Renyi's quadratic entropy of the fea-tures and the Renyi's cross entropy between features and class labels, respec-tively. The Renyi's quadratic/cross entropy is estimated via non-parametrickernel density method. The resulted objective function reaps the advantagesin robustness from both redescending M-estimator and manifold regulariza-tion, and can be e?ciently optimized via half-quadratic optimization in aniterative manner. In addition, some popular algorithms like LPP, SRDA andLapRLS for feature extraction are all justified to be the special cases withinthis framework. Extensive comparison experiments on several real-worlddata sets, with randomly contaminated features or labels, well validate theencouraging gain in algorithmic robustness from this proposed framework.
Keywords/Search Tags:Cluster Analysis, Non-Parametric Kernel Density Estimation, Mean-Shift, Half-Quadratic Optimization, Stochastic Gradient Learning, Feature Extraction, Renyi's Entropy, Robust Statistics
PDF Full Text Request
Related items