Asymptotically heavier tailed distributions
Math in this page not rendering? See the fix
One of Casella’s papers1 contains a decision theory paradox in interval estimation. I became suspicious of interval estimation a while back, triggered by Savage’s work2, but Casella’s paper uses a cute analytical tool that I haven’t seen elsewhere, and which they pass over without comment.
The paradox in the paper depends on the classic loss function for estimating an interval which is to contain the true value of some parameter
where is a positive constant and is the indicator function of which is one when its argument is in and zero otherwise. It turns out that in the most common cases, the standard interval estimators are strictly dominated by estimators that collapse to single points or even to the empty set.
Then they replace with a continuous, increasing function in the loss function and look for conditions on that will avoid the paradox.
They find first that must hold to avoid the paradox, and that it must equal 1 or the estimator ends up returning the whole space as the estimated interval, which isn’t very useful. Since we are dealing with loss functions, we can always add a constant without changing the results, so any to be used in practice must be a cumulative distribution function.
The second condition on that cumulative distribution function is the cute analytic tool that I mentioned. It is a statement about the relative weight of the tails of two cumulative distribution functions. Let and be cumulative distribution functions. We define to be asymptotically heavier tailed than with the predicate
where we have introduced the scaling by to remove any confounding from variance. Otherwise if we were to compare two Gaussian distributions differing only in where they are centered, the predicate would report the one to the right as being heavier tailed than the one to the left, which is erroneous. On the other hand, no matter how we stretch a Gaussian distribution, it will never be reported as heavier tailed than a Cauchy distribution.
I have yet to find a use for this besides the proof in which Casella uses it, but it tickled my fancy.