madhadron

Learning statistics

Status: Notes
Confidence: Very likely

I don’t have any comprehensive guides to statistics to offer. This is an assortment of things I think are important or valuable. It’s really easy to get lost in all the math and lose the underlying logic of interpreting experiments and observations.

First: Freedman’s ‘Statistical Models and Shoe Leather’ drives home the point that statistical methods are not a replacement for wearing out shoe leather to get the data you actually need.

Once you’ve got some data, Tukey’s Exploratory Data Analysis1 is the best guide I know to learning how to deal with it on its own terms instead of at arm’s length via prepackaged statistics. The followup book, Data Analysis and Regression is also wonderful and provides lots of insight on how not to be ruled by your statistical tools.

The next step is understanding the formal basis of inference and inferring causality. Inference is based on decision theory (a special case of game theory). I learned it from the first few chapters of Kiefer’s Introduction to Statistical Inference. The best introduction to issues of causality as we understand them today is Pearl’s Causal Inference in Statistics.

These are all strange recommendations, though. Nowhere in this will you learn how to do a tt-test or fit a line in the usual way. They’re like the supplements you need to work around the problems of a typical course. For a course, I have heard good things about this one based on resampling methods. Resampling builds intuition nicely and gets people over their squeamishness about using randomized procedures.

Meanwhile, Tufte’s four books (The Visual Display of Quantitative Information (1983), Envisioning Information (1991), Visual Explanations (1997), and Beautiful Evidence (2006)) are the canonical reading (viewing?) on visualization.

Totally missing from this discussion is any discussion of experimental design. I have no idea what to point people to for this.


  1. As a community, we need a cheap reprint of Exploratory Data Analysis and its sequel. Finding copies is expensive and difficult.↩︎