Algorithmical and geometrical aspects of statistical depth

Posted on:2001-09-05

Degree:Ph.D

Type:Thesis

University:Universitaire Instelling Antwerpen (Belgium)

Candidate:Struyf, Anja

Full Text:PDF

GTID:2468390014958083

Subject:Statistics

Abstract/Summary:

The first part of this thesis focuses on cluster analysis. Cluster analysis methods try to detect whether a data set consists of several groups. Our goal was to adapt a series of standalone Fortran programs, that are widely used by people in several domains, such that they meet today's standards. For this purpose, we have transformed them to an object-oriented library of clustering functions in S-PLUS. This library also contains graphical displays and indices to evaluate the goodness of the clustering.;The remaining chapters of the thesis discuss statistical depth. In statistics, depth generalizes the univariate concept of ranking to other settings, such as multivariate location and regression. The location depth ldepth (theta; Xn) of a point theta relative to a data set Xn = {x1,... xn} ⊂ Rp determines how central theta lies in the data cloud Xn. Points outside the convex hull of Xn have depth equal to zero, boundary points have low depth, and centrally located points have large depth values. The finite-sample definition of the location depth can easily be generalized to any probability distribution P on Rp . The regression depth rdepth(theta ; Zn) determines how well a hyperplane Htheta with coefficients theta fits a data set Zn = {(x 1, y1),..., (x n, yn)} in Rp . If the data are well-balanced around the hyperplane then the hyperplane has a large regression depth value, while hyperplanes that do not represent the data very well receive a low depth. This depth notion may again be generalized to any probability distribution P on Rp . The location and regression depth turn out to have many similar properties, theoretically as well as computationally.;First we describe an algorithm to compute the location depth of a given point theta when p = 3, as well as algorithms for the regression depth when p = 3 or 4. We prove the exactness of these algorithms. Their complexity is O( np-1 log n) which grows exponentially in p, making this approach unpractical for more than three dimensions. Therefore we also propose approximate algorithms for higher-dimensional data sets. A point with maximal location depth relative to the data set Xn can be used as a robust estimator of location. This deepest location T*l is a natural generalization of the univariate median to higher dimensions. We construct an approximate algorithm for the deepest location in any dimension.;Moreover, we prove some characterization properties. We show that the empirical distribution of the original data points is uniquely determined by the location depth function and by the regression depth function. We also discuss the relation between the depth function and symmetry properties of the original distribution P, which may be continuous as well as discrete.

Keywords/Search Tags:

Depth, Data, Distribution

Related items

1	Research On High Quality Depth Maps Acquisition For RGB-D Data
2	Research On The Depth Distribution Of Linear Codes And The Generalized Derivatives Of Binary Sequences
3	Research On Depth Estimation Algorithms For Monocular Image
4	Retrieval Of Aerosol Optical Depth Based On The Ground-based Measeurement Using Modis Data
5	Retrieval Of Aerosol Optical Depth Based On The Ground-Based Measeurement Using Modis Data
6	Study On Image Recognition Method For Crystal Size Distribution With Limited Shooting Depth
7	On The Three Kinds Of Vector Depths
8	Research On Coding And Compression Algorithm Of 3D Video Depth Data
9	Kinect Depth Data Segmentation Based On Gaussian Mixture Model Clustering
10	Research And Implementation Of Data Distribution Technologies Based On Publish /Subscribe In Distributed Environment