Bootstrap Methods and Permutation Tests
Thinking about the bootstrap idea
• It might appear that resampling creates new data out of nothing. This seems suspicious. Even
the name “bootstrap” comes from the impossible image of “pulling yourself up by your own
bootstraps.”3 But the resampled observations are not used as if they were new data.
• The bootstrap distribution of the resample means is used only to estimate how the sample
mean of one actual sample of size 1664 would vary because of random sampling.
• Using the same data for two purposes—to estimate a parameter and also to estimate the
variability of the estimate—is perfectly legitimate.
• We do exactly this when we calculate x to estimate μ and then calculate s/√n from the same
data to estimate the variability of x. What is new? First of all, we don’t rely on the formula s/√n
to estimate the standard deviation of x.
• Instead, we use the ordinary standard deviation of the many x-values from our many
resamples.4 Suppose that we take B resamples.
• Call the means of these resamples x∗ to distinguish them from the mean x of the original
sample. Find the mean and standard deviation of the x∗’s in the usual way.
• To make clear that these are the mean and standard deviation of the means of the B resamples
rather than the mean x and standard deviation s of the original sample, we use a distinct
• These formulas go all the way back to Chapter 1. Once we have the values x∗, LOOK BACK
describing distributions with numbers, page 30 we just ask our software for their mean and
• We will often apply the boots trap to statistics other than the sample mean. Here is the general
• Another thing that is new is that we don’t appeal to the central limit theorem or other theory to
tell us that a sampling distributio