# Learning statistics

Status: Notes

Confidence: Very likely

I don’t have any comprehensive guides to statistics to offer. This is an assortment of things I think are important or valuable. It’s really easy to get lost in all the math and lose the underlying logic of interpreting experiments and observations.

**First**: Freedman’s ‘Statistical
Models and Shoe Leather’ drives home the point that statistical
methods are not a replacement for wearing out shoe leather to get the
data you actually need.

Once you’ve got some data, Tukey’s *Exploratory
Data Analysis*^{1} is the best guide I know to learning
how to deal with it on its own terms instead of at arm’s length via
prepackaged statistics. The followup book, *Data
Analysis and Regression* is also wonderful and provides lots of
insight on how not to be ruled by your statistical tools.

The next step is understanding the formal basis of inference and
inferring causality. Inference is based on decision theory (a special
case of game theory). I learned it from the first few chapters of
Kiefer’s *Introduction to Statistical Inference*. The best
introduction to issues of causality as we understand them today is
Pearl’s *Causal Inference in Statistics*.

These are all strange recommendations, though. Nowhere in this will you learn how to do a $t$-test or fit a line in the usual way. They’re like the supplements you need to work around the problems of a typical course. For a course, I have heard good things about this one based on resampling methods. Resampling builds intuition nicely and gets people over their squeamishness about using randomized procedures.

Meanwhile, Tufte’s four books (*The Visual Display of Quantitative
Information* (1983), *Envisioning Information* (1991),
*Visual Explanations* (1997), and *Beautiful Evidence*
(2006)) are the canonical reading (viewing?) on visualization.

Totally missing from this discussion is any discussion of experimental design. I have no idea what to point people to for this.

As a community, we need a cheap reprint of

*Exploratory Data Analysis*and its sequel. Finding copies is expensive and difficult.↩︎