Debugging & exploration

This is the part of the series on getting better as a programmer. The articles are:

The majority of your time as a programmer will be spent dealing with either programs that do not behave the way you expect based on your understanding of them, or programs which you don’t understand yet. For the former, you have to make a change to the program until it matches your expectations. For the latter you need to build a mental model to be able to predict its behavior. This is the place where programming comes closest to science.

The behavior of a program and its source code are connected by two maps that matter. The first is the chain of compilers and hardware that run the instructions in the program. The second is the mental model you, the programmer, hold linking aspects of behavior to aspects of source code. If you read only one thing about this, read Peter Naur’s “Programming as theory building.”

As a trivial example, consider the program

N = 10
for i in range(N):
for i in range(N+1):

The programmer that wrote from scratch it began with a high level intention of printing the numbers from 0 up to 10 and then back down to 0. In the construction of the program the constructed a mental model that the final number is controlled by a constant N outside the main control flow, then a loop that counts up and a loop that counts down. The second oop prints the highest number. It could have been written with the first loop going to N+1 and the second to N and behave the same way, and the fact that it could have gone either way is part of the model the programmer builds.

Another programmer who comes along later and needs to modify this program will have to construct their own model. If I were faced with this program, I would see the N = 10 at the top and make a guess based on the culture of programming I have experience in that it’s a constant controlling some aspect of the rest. Then I see the two loops. They’re over similar ranges, but off by one. File way the off by one for later. In one it uses the index i, in the other it uses N-i, so one’s counting up, and one’s counting down. So it prints a sequence of ascending numbers followed by descending ones, and how far up it goes is controlled by N.

Then I need to figure out the edges. What is the end of the first loop, the beginning of the second, and the end of the second? Does it print the top number twice? Is the top number N or something near it? Does it go back to 0 where it started? This I do by a series of experiments. I probably won’t actually run the experiments in the computer if I know the language well. I’ll just simulate them in my head.

Once I’ve ironed those edges out, I have a pretty good model of how this works and where I would go to change how up it counts, or change the behavior at the edges.

This program is tiny, though. We can fit the mental model in all its detail in our heads. But there’s a limit on how much we can hold in our attention, so in real programs we focus on particular aspects of a program and cut off those we can isolate because they don’t affect that aspect and replace parts that do affect it with chunked approximations. There’s also a limit to how fast we can turn short term memory into long term memory, so we don’t commit the whole detailed model to long term memory. We remember paths to reconstruct it. I’ve written elsewhere about such paths to reconstruct mental machinery, which got the name “recovery procedures” in mathematics:

A recovery procedure is a set of heuristic rules which we vaguely remember to apply when we want to recover a mathematical fact. It is like a poster on the control panel of some serious machine, a sub- marine or a plane, which says what the crew should do when things have gone haywire. A rederivation is a semiheuristic argument made in accordance with the recovery procedure. -Borovik, Mathematics Under the Microscope, p.212

Coming back to a part of a program later, we rebuild much of our mental model from these recovery procedures. In looking at the request handler of a web server, we remember roughly what the components of the URL mean, what the pieces of work are, probably some gotchas, and how to figure out what the response looks like. When we speak of “clearly written code” we are talking about how easily we can reconstruct the fully detailed model of a section of the program from our recovery procedures.

At this point we have enough context to actually talk about getting better and debugging and exploring code and what it means to get better at it. On the exploring side, which, depsite the order I titled this piece in, always comes first, we get better by becoming better at the lifecycle of

We get better at debugging by developing our ability to quickly build models and try to tease out where they disagree with the code and its machine behavior.

Given how long it took to get this point, you might imagine the rest of this is going to be enormous. And you would be right. We are basically describing the theoretical toolkit of a practicing scientist. Fortunately, we mostly don’t need the full toolkit. Scientists are, to borrow an analogy from Feynman, trying to figure out the rules and goals of chess from snippets of games. We have the advantage of generally knowing the goals and the rules are likely going to be within certain bounds of what we, as a field, have invented. And this is good, because unlike a scientist who expects to spend years figuring out why a given piece moves a certain way, we generally need to figure out what’s going on in a matter of hours.

Constructing mental models

Let’s begin with the process of constructing models. First, disabuse yourself of the idea that there is some magic way of looking at a situation and conceiving the right model from the void. Every modelling exercise is an iterative process where we start with something that seems vaguely relevant but is probably wrong, check how it fares, revise, and repeat.

How do we choose a good starting point? Usually we start with an analogy to some other system we already know. Is this a web service? I’m pretty sure it has an HTTP server and some way of routing URLs to handlers, and some form of data store behind them. Is it an embedded controller for a dishwasher? There’s going to be some set of sensors and buttons that it reads from, and some kind of state machine, and actuators it controls. Or, in a truly unfamiliar system, we may start as crude as there being an entry point where the program starts and a point where it ends.

Then we go try match the pieces, and then mismatch them. For a web server’s URL routing, we first try to match it. We might find a fragment like

router = Router()
router.match('/login', handle_login)
router.match('/home', render_home)

or routes spread everywhere as in

def handle_login(request):

def render_home(request):

Given one of these we would refine our model in a particular way. In teh first, we tell ourselves to remember that there’s a single place where calls to .match add routes. In the second we remember that routes are spread throughout the code in annotations.

Then we try to mismatch it. In the first case, we look at handle_login and render_home and look for similar functions in the same part of the code that seem likely to be called by external triggers to provide some behavior. Are they all linked to a route? When if we find a bunch of handler functions and no route for them? We search for where they’re used and discover that they’re handlers for particular frames from a websocket. In the second case, are there other things than @route that we see when we look around? Maybe we have a different annotation @frame that also dispatches frames from a websocket. In either case we refine our model to recognize two routers.

Then we do it again. Maybe we go further here, by looking at handle_login and saying to ourselves, “There must be the rest of an authentication system here.” Then we go looking for password hashing, session cookies, permissions, and the rest of the things we expect in an authentication system. We could also back out and take some other vague piece, suchas the data store. What databases are being set up?

Casting models to fit our minds

Our models have to work within our cognitive limits. Generally this means you’ll end up with a web of models linked together. Each model is tractable for our purposes, and its links have two pieces. The first piece is a recovery procedure. Our top level model of a web service may only be that there’s a Django application running as a systemd service on a server in AWS, with a managed PostgreSQL instance as its data storage, with its schema in a SQL file in our source code, a plain HTML frontend using forms, and logs being written to local disk. Each of these bits is a breadcrumb trail for our minds. It’s a systemd service? Somewhere there had better be a systemd unit file in our source code describing how it’s run. It’s Django? If we use Django for lots of projects that probably leads into a whole set of models about Django that we reuse from project to project. If we don’t use Django much, then ‘Django’ may just be a tag we keep to make it easy to search for things online, and our recovery procedure is actually that there’s a list of URL patterns in one place that point to handlers. The handlers we may remember in categories like authentication, CRUD for various entities the program deals with, and administrative tools, and each category carries a recovery procedure for another model.

The second piece is a massively simplified view of that part of the system that lets us treat it as a small enough box that it leaves cognitive space for something else. For authentication, that view might be that there is a session cookie if you’re logged in and a login method to obtain such a cookie. That’s tiny and, if you write the code to support it, you can operate on many aspects of the program with only that level of detail about authentication in your working memory.


Human memory is associative. That is, we remember things because we remembered something else that they were linked to.

This has led to all kinds of tricks like memory palaces to structure large amounts of information so we can recall it. A friend of mine has basically the entire range of diagnostics she learned in medical school arranged into such a structure. Our needs as programmers are a little different because the information we care about shifts quickly as programs change or as we work on new projects. My friend’s memory palace will serve her for decades. My knowledge of a particular codebase will probably only serve me for a couple years, and parts of it will change between times that I look at them.

This is why the reconstruction procedures linking our models are so important. We use particular signposts in the code to trigger the memory of the next steps in our model. So if we return to that web service a year later, we remember that there’s a table of URL routes and go searching for that. Those routes and their names then trigger our categories, and the name of the handlers in each category trigger our memory of the pieces of our more detailed model of each.


Debugging is what we do when our model of a program does not accurately predict what it does. We have to isolate where the mismatch is, what it is, and how we remedy it.

Isolation is constructing a new model in our web of models, but it differs in an important way. An isolated model has very sharp boundaries that we can control. As an extreme example, if we have a function with no side effects, running in an environment where the bugs have largely been ironed out, and we can easily construct example inputs for it and examine the outputs from it, then our isolated model’s boundaries are exactly those of the function. We can write unit tests to force it to run on certain conditions, then step through with a debugger to isolate subsets of the function into even smaller models.

This successive isolation is much like any other search strategy. Bisect the function and see if the problem occurs in the first half or the second, then bisect again. Similarly slice up your inputs and see if you can find simpler cases that trigger it.

In more complicated cases, you often can’t isolate immediately. You’re not sure where the boundaries are. In these cases it helps to use a debugger or some other means to trace all the way through, looking for where mismatches show up.

Alternately, sometimes you have error messages or other hints where a somewhat unique string of characters lets you search and jump right into the heart of the matter. This doesn’t always happen, but when it does you seem like a wizard. I once looked over the shoulder of a colleague when they were having an issue, saw a certain snippet of text involving a database error, and said, “I bet I know what evil is causing thing.” I then walked back to my computer, did a quick grep, and thirty seconds later had the offending passage of code on my screen with a mental model that isolated it. We rarely get this lucky, but it’s worth cherishing these memories for the times when it’s truly a slog.

The most difficult cases are where the mismatches are due to the interaction of diffuse state throughout the system, and when they occur only some of the time. These quickly leave the realm of our familiar analogies with other systems and become much more like science.

How to ascend

At the beginning, read code and ask about it. Your ability to navigate new systems is roughly proportional to the number of previous systems you have developed an understanding of.

Tactically, get a modern IDE like those from JetBrains and learn to use their code navigation tools. Visual Studio Code and other editors often have similar tools based on the Language Server Protocol today. An hour or so getting comfortable with these pays for itself in a few days as you explore code.

Books and essays on how systems should be structured are relatively useless when you’re trying to learn exploration and debugging. You need to explore real systems, not what people have decided the should be, and learn to set up your own sign posts and recovery procedures. The exception is The Architecture of Open Source Applications, which has the designers of real systems walking through their mental models of their creations.

The other tactical skill to learn is how to use a debugger effectively. Most people have figured out that you can set break points and look at the values of variables and the stack. Good debuggers let you set breakpoints that only fire when a condition is true, or capture whenever a certain exception is thrown anywhere in a program. Some let you script them. I remember seeing a very cool trick someone played with gdb where they put a breakpoint on the exit function of their program and had it restart the program when it occurred, and also set a conditional breakpoint for when the random condition ocurred. Then they started it and the program ran again and again before finally stopping with the condition they wanted. There are also time travel debuggers which let you step forwards and backwards in time. Combined with the previous trick you can capture state, then step backwards to see how you arrived at it.

Debuggers aren’t always available. Live systems in production aren’t always reachable, and distributed systems are notoriously hard to debug without some kind of kludge of attaching separate debuggers to the different pieces and trying to coordinate them (which everyone tries at some point and swears never to do again if they can avoid it). In this situation we fall back to having the system emit information about what it’s doing in a steady stream and examining the stream and try infer what we would from tracing in a debugger from this historical data.

This series is still being written. Subscribe to get emailed when each is section is posted: