Robust Nonparametric Multivariate Outlier Identification

Series: 
Stochastics Seminar
Thursday, September 25, 2008 - 15:00
1 hour (actually 50 minutes)
Location: 
Skiles 269
,  
Department of Mathematical Sciences, University of Texas at Dallas
Organizer: 
Robustness of several nonparametric multivariate "threshold type" outlier identification procedures is studied, employing a masking breakdown point criterion subject to a fixed false positive rate. The procedures are based on four different outlyingness functions: the widely-used "Mahalanobis distance" version, a new one based on a "Mahalanobis quantile" function that we introduce, one based on the well-known "halfspace" depth, and one based on the well-known "projection" depth. In this treatment, multivariate location outlyingness functions are formulated as extensions of univariate versions using either "substitution" or "projection pursuit," and an equivalence paradigm relating multivariate depth, outlyingness, quantile, and centered rank functions is applied. Of independent interest, the new "Mahalanobis quantile" outlyingness function is not restricted to have elliptical contours, has a transformation-retransformation representation in terms of the well-known spatial outlyingness function, and corrects to full affine invariance the orthogonal invariance of that function. Here two special tools, also of independent interest, are introduced and applied: a notion of weak covariance functional, and a very general and flexible formulation of affine equivariance for multivariate quantile functions. The new Mahalanobis quantile function inherits attractive features of the spatial version, such as computational ease and a Bahadur-Kiefer representation. For the particular outlyingness functions under consideration, masking breakdown points are evaluated and compared within a contamination model. It is seen that for threshold type outlier identification the Mahalanobis distance and projection procedures are superior to the others, although all four procedures are quite suitable for robust ranking of points with respect to outlyingness. Reasons behind these differences are discussed, and directions for further study are indicated.