madhadron

Asymptotically heavier tailed distributions

Math in this page not rendering? See the fix

One of Casella’s papers1 contains a decision theory paradox in interval estimation. I became suspicious of interval estimation a while back, triggered by Savage’s work2, but Casella’s paper uses a cute analytical tool that I haven’t seen elsewhere, and which they pass over without comment.

The paradox in the paper depends on the classic loss function for estimating an interval [a,b][a,b] which is to contain the true value θ\theta of some parameter

L(θ,[a,b])=C(ba)𝐈[a,b](θ)L(\theta, [a,b]) = C \cdot (b-a) - \mathbf{I}_{[a,b]}(\theta)

where CC is a positive constant and 𝐈[a,b]\mathbf{I}_{[a,b]} is the indicator function of [a,b][a,b] which is one when its argument is in [a,b][a,b] and zero otherwise. It turns out that in the most common cases, the standard interval estimators are strictly dominated by estimators that collapse to single points or even to the empty set.

Then they replace C(ba)C \cdot (b-a) with a continuous, increasing function S(ba)S(b-a) in the loss function and look for conditions on SS that will avoid the paradox.

They find first that limxS(x)1\lim_{x \rightarrow \infty} S(x) \leq 1 must hold to avoid the paradox, and that it must equal 1 or the estimator ends up returning the whole space as the estimated interval, which isn’t very useful. Since we are dealing with loss functions, we can always add a constant without changing the results, so any SS to be used in practice must be a cumulative distribution function.

The second condition on that cumulative distribution function is the cute analytic tool that I mentioned. It is a statement about the relative weight of the tails of two cumulative distribution functions. Let FF and GG be cumulative distribution functions. We define FF to be asymptotically heavier tailed than GG with the predicate

x0,δ0such thatxx0,δδ0,S(xδ)G(x)>0 \exists x_0, \delta_0\ \textrm{such that}\ \forall x\geq x_0, \delta \geq \delta_0,\ S(x \delta) - G(x) > 0

where we have introduced the scaling by δ\delta to remove any confounding from variance. Otherwise if we were to compare two Gaussian distributions differing only in where they are centered, the predicate would report the one to the right as being heavier tailed than the one to the left, which is erroneous. On the other hand, no matter how we stretch a Gaussian distribution, it will never be reported as heavier tailed than a Cauchy distribution.

I have yet to find a use for this besides the proof in which Casella uses it, but it tickled my fancy.


  1. Casella et al, A paradox in decision-theoretic interval estimation, Statistica Sinica 3(1993), 141-155.↩︎

  2. Savage, The Foundations of Statistics.↩︎