by Fred Ross Last updated: December 6, 2013

One of Casella’s papers^{1} contains a decision theory paradox in interval estimation. I became suspicious of interval estimation a while back, triggered by Savage’s work^{2}, but Casella’s paper uses a cute analytical tool that I haven’t seen elsewhere, and which they pass over without comment.

The paradox in the paper depends on the classic loss function for estimating an interval $[a,b]$ which is to contain the true value $\theta$ of some parameter

$L(\theta, [a,b]) = C \cdot (b-a) - \mathbf{I}_{[a,b]}(\theta)$

where $C$ is a positive constant and $\mathbf{I}_{[a,b]}$ is the indicator function of $[a,b]$ which is one when its argument is in $[a,b]$ and zero otherwise. It turns out that in the most common cases, the standard interval estimators are strictly dominated by estimators that collapse to single points or even to the empty set.

Then they replace $C \cdot (b-a)$ with a continuous, increasing function $S(b-a)$ in the loss function and look for conditions on $S$ that will avoid the paradox.

They find first that $\lim_{x \rightarrow \infty} S(x) \leq 1$ must hold to avoid the paradox, and that it must equal 1 or the estimator ends up returning the whole space as the estimated interval, which isn’t very useful. Since we are dealing with loss functions, we can always add a constant without changing the results, so any $S$ to be used in practice must be a cumulative distribution function.

The second condition on that cumulative distribution function is the cute analytic tool that I mentioned. It is a statement about the relative weight of the tails of two cumulative distribution functions. Let $F$ and $G$ be cumulative distribution functions. We define $F$ to be asymptotically heavier tailed than $G$ with the predicate

$\langle\exists x_0, \delta_0 :: \langle \forall x, \delta : x \geq x_0 \wedge \delta \geq \delta_0 : S(\delta\cdot x) - G(x) > 0\rangle\rangle$

where we have introduced the scaling by $\delta$ to remove any confounding from variance. Otherwise if we were to compare two Gaussian distributions differing only in where they are centered, the predicate would report the one to the right as being heavier tailed than the one to the left, which is erroneous. On the other hand, no matter how we stretch a Gaussian distribution, it will never be reported as heavier tailed than a Cauchy distribution.

I have yet to find a use for this besides the proof in which Casella uses it, but it tickled my fancy.