There are certain tactics or gambits that I have found over the years that make code more testable. When I say more testable, I mean that it requires fewer test cases to demonstrate correctness and that those test cases should be easier to construct. In each of the tactics that follow we’ll make those two criteria—fewer and easier to construct—more precise.
Consider a function
What are somethings people might want to pass in for
"2 hours ago"or
"3 weeks ago"
and heaven knows what else. 3PM on the third Thursday of last month, perhaps? Imagine if
events_since knew how to interpret all of these. The test plan is going to be terrifying.
We can make a qualitative calculation of how big such a plan will be. Say we accept those four forms for dates, and for each form we need to pass in N cases. It may be fewer or more by a bit for each case, but it shouldn’t be ten times fewer or ten times more, so we can at least see how it will scale, rather the same way that when we analyze the complexity of an algorithm with big-O notation we neglect lots of constants and aim for a rough scaling law in terms of the size N of the inputs.
For four types like this, we have cases. But what if we have a function like
We combine the test plans for each parameter by taking all combinations. Expressing it in code, if we have test cases
tf_cases, our test cases for both together are
For three or more parameters we take all combinations in the same way. This is called a factorial design and is one of the foundational achievements of statistics.
events_between, if we accept all four kinds as above, we now have cases. is typically on the other of 5 to 10, so we’re looking at 400 to 1600 test cases.
Instead, let’s choose a canonical form for time. In Python that would probably be the
datetime type. In a packet capture system I worked on it was a 64 bit, unsigned integer representing microseconds since the epoch. Then we write adapter functions from our various input kinds to that canonical form. For example,
and our function has a more restricted signature:
Now we have N cases for
ago, N cases for
from_unixtime, N cases for
from_iso, and cases for
events_between. In all, that is cases, as opposed to for accepting everything. For a plausible value like , that’s still only 130 cases as opposed to the 1600 we said before.
This actually underestimates the improvement, though. If we look at our API as a whole, we get similar savings for
events_since as well as
events_between. The more functions we have that use the canonical type instead of handling all cases, the more test cases the canonical form and adapters save us.
Now, the canonical form is not always a concrete type. It may be an opaque interface. For example, imagine that we have a system that needs to be able to notify individuals, other systems, pub/sub queues, or oncalls of an event. If we write a function
What goes it its test plan? It’s going to be huge, and it will require us to be able to set up a test environment for each of the possible kinds of recipient to check that the alert was dispatched properly. Not only is the test plan large, but the individual tests will require complex environments.
Instead, we can create an interface. There’s no “interface” type in Python, but in pseudocode it would look like
from typing import NamedTuple, Callable class Messageable(NamedTuple): send: Callable[[str], None] name: Callable[, str]
In C++ it would be an abstract base class. In Java or Go it would be an interface, since those languages support the idea directly. In C it would be a struct with the fields
name that are function pointers. The
alert function now has the type
And you can again write adapter functions, such as
and whatever else needs to receive alerts. The test plan for
alert now becomes straightforward. We test entirely with a simple implementation of
Messageable, such as
The individual adapters such as
to_oncall need to have more complex test environments, but they stay separate, and we only need those tests for the adapters, not for any other functions that use the
As is often the case, what makes the code more testable also buys us some nice maintainability. If you need to add a new kind of recipient that can receive alerts, you don’t have to touch the
alert function at all, or even have access to its code. You just need to write a new adapter function that produces a
While spelunking through a piece of code I ran across a piece of spaghetti that resembled the following call tree:
Working with code that actually does this kind of ping-pong is a nightmare. What any chunk of code is supposed to do is spread across many functions and it requires a frustrating amount of brainpower just to figure out what’s going on.
Anyway, we said before that when we have multiple parameters, we have to take all combinations of their test cases to get our full test plan. With this kind of spaghetti, the parameters to both
B have to be taken together. If they could have been tested separately, it would have been test cases (1100 cases for ). Together, they need test cases (100,000 with ). 1100 to 100,000 is a big difference.
If we look at what’s going on here,
B.send is being called to do something and is referring to
A to get information it needs for that. We can decouple them and reduce our test cases by making
A give you everything you need and then disposing of it.
We could probably omit the classes entirely and just write
Now we are back to test cases, and the code will be much more readable.
Similarly, if you have a function that does some computation and has side effects, you can simplify your life by splitting it into a pure function that calculates what the effects should be and a simple function that runs side effects.
So if we have a function
we can split it into
delta requires no complex setup. It’s just inputs and outputs. Testing
apply requires more setup, but its test cases are much simpler.
If you apply the extract and dispose idea recursively, you end up with pipelines, such as
x = f(input) y = g(x) z = h(y) m = p(z) n = q(m) return n
At each stage you finish entirely with the previous work and forget everything except what you need for what follows. This is extremely easy to test, but it is also extremely easy to end up thrashing from one step to the next, which brings us to our next tactic:
Here’s some functions that query information about employees:
Your task is to write a function that, given some names, returns their sex, if they have hair, and if they know Haskell. Looking at the types of these functions, this is going to be a bit awkward.
Those three functions could all have the same type. If we choose to make them all the same type, such as
then our code becomes more straightforward:
Now, this is all well and good, but how is it going to improve our testing? The number of test cases for
query isn’t going to change because we reorganized this, even if the code is cleaner, and
knows_haskell are probably querying a database of some sort, so testing
query requires setting up a test environment with that database.
Matching impedances this way buys us power, though, because we can introduce a function that can chains these transforms. Transforms from a type to itself are also called endomorphisms, so you can sound very sophisticated with the battlecry “Chain your endomorphisms!” Anyway, a chaining function:
If we test
chain, then our various functions like
query stop being a thing we write a separate function with testing. We just write
which is something we would probably just write inline wherever we are using it and barely needs testing at all. We can test
chain with whatever functions of the right signature that we want, so we can choose ones that, unlike
is_bald, need no external data source.
For a single function like
query we haven’t bought much, but if we have more than just
query dealing with such functions, then will reduce a lot of test cases by only testing the endomorphisms and chain and having to do very little beyond that.
Now, we could have chosen any of the three function types to standardize on. The one we chose is probably the least flexible because the functions themselves hardcode the names of the keys they write. That can be good or bad depending on your environment. You could just as easily write a
rename endomorphism. For your amusement you might want to try writing a
chain function for the other options.
An extreme and well known case of the kind of thrashing we’re avoiding with this tactic is object relational mismatch, where we go from the world of a relational database where we store aspects of identities to a world of objects where we store entities with all their aspects. In the relational world aspects may be added or ignored at will. In the object world an entity has an expected set of attributes. Objects contain other objects. In relational terms, you join the aspects you need and nothing contains anything else.
The classic example of opaque types is generics, such as
What properties of
T do we need to consider when we test
head? None! We can pick any type that is convenient to test it. We could use
None, but we will want to be able to distinguish the various elements of the lists we pass in, so we’ll want a somewhat bigger type.
int is usually a pretty solid choice in such cases, but we can choose whatever is easy to test, and then in use we can pass whatever complicated type we may need.
But now suppose have a function that queries a service for posts made bya given user between two points in time:
What is that
FormatSpec? Inevitably in an API you need to specify what parts of the data you need to return, and that is what the format spec specifies. Our function probably does nothing with it, just passes it on to whatever service it queries. If we make the client explicit and parameterize it over the type of the
We can create an interface for
Client and subclass it for a fake client for testing and a real client for production, such as
then we can test
MockClient, where the only possible value of
None. That effectively removes a parameter from our test plan. Going from to (with , 10,000 to 1,000 test cases) is a significant win.
Especially if you have to pass a parameter through multiple layers of code, opacifying passthroughs like this can save a lot of tests.
You can structure your code to dramatically reduce the number of test cases it needs.
Noneor similarly boring types in that slot.
Handily, all of these tend to give you code that is easier to read and easier to change over time as well.
If this was helpful to you, look at my course on easier, faster testing.