Approximating idealized scientific trials

July 14, 2013

The basic unit of science is the trial. I choose the word 'trial' deliberately, and eschew 'experiment', since a great deal of science does not consist of experiments. Astrophysics, for example, has no experiments whatsoever, only observations. Observations, not experiments, provides the strongest link between smoking in lung cancer, established by long term studies of identical twins in Scandanavia. Nor should we artificially limit the notion of a scientific trial. The botanist producing illustrations for a flower atlas is running a trial, though its outcome—a drawing—is very different from the result of a physicist measuring the speed of light, and different yet from the trial of an historian producing an account of the past.

Somehow these trials give us knowledge about the world. This is astonishing, and not at all obvious. How is a chemist testing for the presence of mercury in a solution different from a child playing with glassware? Both may be pouring colored liquids from tube to tube.

The difference is that the scientist is trying to run some idealized trial that, if it could be done perfectly, would impart knowledge. Grow two varieties of corn and see which one yields more produce. Count the objects found in archeological sites in Gaul that originated in northern Africa, comparing the number from before the Goths wrested control from the Roman empire and after, to see how long distance trade between the two regions changed. Slowly reduce the temperature of a piece of material to find the temperature at which it becomes superconducting. Select a number of specimens of dandelion and use them to make a drawing capturing the important features that identify a dandelion. Each of these idealized trials, if they could be done perfectly, would give us real knowledge about the world.

Unfortunately, when running real trials, we find ourselves in slippery, messy territory. Any real trial is at best an approximation of an idealized trial. Which varieties of corn will get planted where for the trial? Are some corners of the field wetter or dryer? Are they cross pollinating? What watering and fertilizing schedule should we use? Would our results differ if we used a different fertilizer? Trying to reason your way to a perfect approximation to an idealized trial is paralyzing.

It's also futile. Given a full list of possible errors we might fall into, we might reason our way to a perfect approximation to an idealized trial, but we cannot get such a list. This is easier to see in hindsight. Consider the first recognizable clinical trial in the west, run in 1747 by James Lind of the British Navy1. He took eight sailors suffering from scurvy, and tried four treatments on them in parallel, each treatment given to two men. He didn't allocate it randomly, but rather based on prevailing wisdom at the time about what was effective. But expecting randomization in the eighteenth century is unreasonable. It took the development of probability through the nineteenth century and then of statistics in the late nineteenth until the invention of randomized experimental designs. Lind did not even have the vocabulary to speak of it. There is no reason to think we are not as blind to possibilities of error in our own practice.

Interestingly, scurvy trials have certain features that make them less susceptible to the errors our current practice tries to avoid. Sailors long at sea did not spontaneously recover, which removed many of the sources of error randomization tries to prevent, and almost all of Lind's remedies as an eighteenth century physician were placebos, whether he knew them to be or not, so he was effectively blinded and always using placebo controls. All this is true of most nutritional diseases, and by the nineteenth century the British Navy, with only this degree of sophistication in clinical trials, was able to render sailors who had been long at sea the healthiest single group in British society.

The simplifying factors of nutritional diseases do not apply to other medical issues. Physicians remained worse than useless for treating most diseases until the late nineteenth century and the introduction of placebo controls, and, through the course of the twentieth century, of double blinding and randomization. Obstetricians remained worse than useless until the last decade or two of the twentieth century.

Returning to our argument, if we cannot, even in principle, perfectly approximate an idealized trial, how are we to say if a given approximation is adequate? The best answer I have found to date is the "reasonable person" standard from law. Its first articulation in 1837 in Vaughan v Menlove is as clear as any:

“The care taken by a prudent man has always been the rule laid down; and as to the supposed difficulty of applying it, a jury has always been able to say, whether, taking that rule as their guide, there has been negligence on the occasion in question. Instead, therefore, of saying that the liability for negligence should be co-extensive with the judgment of each individual, which would be as variable as the length of the foot of each individual, we ought rather to adhere to the rule which requires in all cases a regard to caution such as a man of ordinary prudence would observe. That was, in substance, the criterion presented to the jury in this case and, therefore, the present rule must be discharged.”

In Lind's day his trial was reasonable, but not today. And reasonable practice varies not only with time but with field. The first randomized clinical trial took place in 1946, years after randomization became part of the reasonable person standard in agronomy.

Too often the details of how a scientist approximates an idealized trial, and, in esoteric fields like particle physics, what idealized trial she is approximating, are overlooked when teaching or popularizing the subject. Yet these constitute the majority of the knowledge of a field. Is it obvious that you should paddle your canoe upwind while collecting microbe samples from a freshwater lake? Or that your microscope needs to have its optical path Ko:hler aligned carefully before trying to use it? Yet without these details the knowledge gained from a trial would become so much argument from authority.

1 Arun Bhatt, Evolution of Clinical Research: A History Before and Beyond James Lind, Perspect Clin Res. 2010 Jan-Mar; 1(1): 6–10.

Did you enjoy that? Try one of my books:
Nonfiction Fiction
Into the Sciences Monologue: A Comedy of Telepathy