Font Size: a A A

How Bandwidth Selection Algorithms Impact Exploratory Data Analysis Using Kernel Density Estimation

Posted on:2014-09-19Degree:M.AType:Thesis
University:University of KansasCandidate:Harpole, Jared KFull Text:PDF
GTID:2458390005495603Subject:Psychology
Abstract/Summary:
Exploratory data analysis (EDA) is important, yet often overlooked in the social and behavioral sciences. Graphical analysis of one's data is central to EDA. A viable method of estimating and graphing the underlying density in EDA is kernel density estimation (KDE). A problem with using KDE involves correctly specifying the bandwidth to portray an accurate representation of the density. The purpose of the present study is to empirically evaluate how the choice of bandwidth in KDE influences recovery of the true density. Simulations were carried out that compared five bandwidth selection methods [Sheather-Jones plug-in (SJDP), Normal rule of thumb (NROT), Silverman's rule of thumb (SROT), Least squares cross-validation (LSCV), and Biased cross-validation (BCV)], using four true density shapes (Standard Normal, Positively Skewed, Bimodal, and Skewed Bimodal), and eight sample sizes (25, 50, 75, 100, 250, 500, 1000, 2000). Results indicated that overall SJDP performed best. However, this was specifically true for samples between 250 and 2,000. For smaller samples ( N = 25 to 100), SROT performed best. Thus, either the SJDP or SROT is recommended depending on the sample size.
Keywords/Search Tags:Density, Data, Bandwidth, EDA, SJDP, SROT, Using
Related items