madhadron

Development process & architecture

This is the part of the series on getting better as a programmer. The articles are:

I am not fond of the term ‘architecture’ applied to software. Software is not a building, nor is most of the thinking that applies to buildings helpful for thinking about how we organize the programs we write. I prefer ‘program organization,’ and I will use that term instead of architecture in spite of the title.

One lesson that does transfer from buildings is that you can’t separate the system that creates a work from the character of the work that results. This extends far beyond buildings and programming. It’s in the work of Christopher Alexander, famous for his creation of the idea of pattern languages in architecture. It’s in Marshal McLuhan’s aphorism that “the medium is the message.” In software we have Conway’s law that a program’s organization will reflect the structure of the organization that created it. Edward Deming, who laid much of the groundwork for this understanding, summarized it as: “the system that people work in may account for 90 or 95 percent of performance.” Development process and program organization are two sides of the same coin.

Program organization

Let’s start with program organization. One of the reasons I dislike the term ‘architecture’ is its connotation of producing one building, the single result of the work. Program organization becomes vastly more straightforward when we think in terms of what David Parnas termed ‘program families.’

The idea of a program family emerges when we think of all the variations we might write of a program. We may have different sets of features for different deployments or customers. We have various pieces of the program hooked up to test suites. We have subsets of it designed for local development or verification. As we alter a program we are taking it from one member of the family to another. There is some notion of pieces of a program that we mix and match and wire up in different ways, both ones currently in use and hypothetical ones we might want to produce.

Once we are thinking in terms of this ensemble of the programs in a family, certain approaches become obvious. We may want to replace some implementation detail, or wire up a subset of functionality to a test suite. For the former we must hide the implementation detail in a way where our program family can easily encompass members with other choices. For the latter, the view from a given piece of a program must be sufficiently limited that we can wire it to test suites or full versions of the program. We refer to these pieces that hide a secret that can be replaced in various members of the program family as a module. This is the original definition given by Parnas.

Parnas continued this line of thought and arrived at a generic statement that every program consists of modules hiding three kinds of secrets. One kind hides hardware so it can be replaced. A second hides the various behaviors that this project requires. The third hides decisions around algorithms, mathematical theorems, physical facts, and other things that are entirely internal to the programming effort and do not appear in the requirements of the software.

The task of program organization, then, is to identify the modules and how they are to be connected. Exploring the space of the program family is generally yields the modules. For how they are connected, the best method I have found for arranging the connections is via data modeling.

Essentially all programs exist to process data from some source and put it in some destination. Even something as seemingly abstract as a numerical simulation in physics is beginning with a current state of the simulation, generating a new state from it, and writing that new state in a place where it is available to later iterations of the simulation. This question of where data originates, where it will be processed, where it will be written, and on what schedule this all happens in various members of the program family turns out to be a very general design approach.

For example, consider something like a word processor. We type a key. The program needs to take the current state of the document and the window and the key we typed and produce an updated state of document and window. In various members of the program family, what the window consists of may vary, so we need to hide that in a module from the rest of the program. Similarly, we might want different input sources or keymaps, so the original of the event should be hidden in a module. What lies between those two, the updating of state should clearly be secret from the other two. The model’s update occurs after the controller’s event, and the views are updated after the model is updated, so our flow of control can only be organized in one way. And so we quickly find ourselves with the model-view-controller structure where some controller emits events that a model uses to update, and various views subscribed to the model are notified and update themselves based on the new value of the model.

The most developed mental structure to work with this is to use a relational model to define the logical structure of the data, and then map that onto physical processing as if you were a query planner. This provides a mathematically general substrate for considering the structure of the data and decades of background on how that logical structure can be mapped to real computation and what needs to be considered when doing so.

Development system

Development process describes the norms of how programmers and other people involved in software development receive information, make decisions, and allocate their attention. In the end, all we have are individuals allocating their attention and effort at the present moment based on the information available to them. The combination of these norms, the programs they produce with their effort, the tools and environments they use to do so, and the feedback loops that provide the information and incentives they use to make decisions is the software development system.

One useful way to think about this is in terms of Boyd’s OODA (observe-orient-decide-act) loops. In particular,

One of Boyd’s key insights in fighter plane tactics was that if you are going through the OODA loop faster than your opponent, then they at best have to react to your decisions, and at worst are taking actions based on the world before you acted.

Software development, unlike flying a fighter plane, is generally not adversarial, but the rate at which you cycle through the loops still matters. Consider the difference between making a change to a program where the compiler and test suite tell you in less than a second if your change is working versus making a change when you must deploy it and wait three days to find out if it worked. A programmer in the former case can loop hundreds of times for every loop of the latter. Every loop is a chance to correct errors, conduct experiments, and learn.

What limits the speed of our loops? First, the time required to actually make a meaningful change that will result in valuable information, and, second, the rate at which that change can produce valuable information once it is made. If it takes a couple days to make a change to your program and knowing if it is what was needed takes observing users for a week, you cannot usefully loop any faster than ten days to two weeks. If you are writing a program to meet your own needs your loop may be measured in seconds.

One of the first things we have to learn is how we want to organize our program. Since our program’s organization will reflect the organization of the people writing it, we want to begin software projects with a team that can fluidly organize and reorganize on a time scale faster than the the project’s OODA loop. Such a team must be small and must be free from outside constraints that will prevent its reorganization. That it must be small is fairly obvious, but what are these constraints? Consider a company where programmers are evaluated twice a year on whether they produced the results they told management that they would at the beginning of the six month period. They have had to promise something and will be penalized for deviating from it. The team they are on will not reorganize because they are not going to change what they are working on.

Organizations, as they grow, tend to grow such constraints. They feel natural to managers trained in modern corporations. In a large organization where such constraints have ossified the structure of both the organization and the software, the software development system becomes ad hoc and no one has the power to modify it. Given that, managers start focusing on individual behavior within whatever system they find themselves in. This is why the industry today is so obsessed with personal impact. It removes attention from the learned helplessness of everyone involved regarding the software development system, and provides another lever for the control of individuals.

In an organization where the software is basically complete and receives only incremental changes and fixes, this may be fine. Companies like Google and Facebook have established near monopolies with stable revenue streams. The company’s focus is on maintenance of their revenue stream. Organizations in this state do not produce dramatically new software that meets new needs. They buy it from small teams that do.

This does not change the fact that nearly all of the performance of software development teams is determined by the system. If we can actually modify our system, that is where we should focus.

When we are looking at what our system should be, the first thing we focus on the natural loop size. If the firmware we are writing cannot be deployed more often than once a year when an aircraft comes in for a scheduled overhaul, the natural loop of your system is one year. If you are updating a website with a lot of traffic, your natural loop may be hours or days as you make a change, deploy it, and wait for enough traffic to pass that you can learn from it.

Next, what are the inner phases of that loop? For updating a website, you may have a quality check that runs once in the loop. For annual firmware deployments, you will likely have whole inner loops being deployed to test aircraft where you make a change, perform a quality check on it, deploy it to the test aircraft, and then run the test aircraft through a set of scenarios to see the results. Your inner loop will likely be days or weeks. You may also have phases that are required by law. In Canada some software requires a professional software engineer to sign off on its release. If the software is changed, it has to go through the whole evaluation and sign off process again. Embedded software for medical devices typically has testing gates like this as well.

Then, how do you get the information from that loop to the team in a useful way? How do they make decisions on it and what is incentivized? Especially as software organizations get larger, feedback loops can break down and disappear entirely, or even be intentionally destroyed by groups trying to protect territory. The other thing that breaks feedback loops is heroism, where someone steps in to try to keep something working that otherwise would systemically break or fail. Each time someone does this, it stops information about the problem from reaching where it needs to go. If it is your job in the system to handle an issue, handle it. If it is not, let it fail and try to mitigate the damage. Otherwise the system will not adjust.

A given team operating as part of a given software development system will get some amount done in each loop. That amount is dictated almost entirely by the system. In a given loop, the team can do what seems most important to them, nothing more. If a system has been operating for a while you may have data on its past production that will let you predict how long similar tasks are likely to take in future. In these situations, planning months or years out may work fine. In manufacturing this is called a process being ‘in control.’ Conversely, there is no point trying to make detailed predictions for a process that is not in control. In such situations, roadmaps and longterm projections are so much nonsense.

It is very easy to take a process that is in control and send it out of control. Lay off 20% of a team and the process is now out of control. Mandate a new quality gate and the process is now out of control. Force a team to crunch to meet a deadline, and the process is now out of control. The projections based on the previous, in control state of the system are useless. The only way to have usable projections again is to bring the system back into control and measure again.

There is one last point I want to make about process: one thing done is better than two half done. This is true in terms of value provided. Half done things provide no value. This is true in terms of the system being in control. Partially done work introduces variation and drags away the attention of a team and makes it more likely that the past behavior of the system is about to become of no use as a guide. This is true in terms of the mental health of the people involved. Switching contexts is one of the most taxing parts of a programmer’s work. Systems should be optimized to have work flow to completion rather than pile up partially done.

How to ascend

Before diving into program organization in any deep way, you need to be able to chunk programming tasks, that is, given some problem of a kind that shows up regularly in your work, your mind knows the shape of the solution and how it will fit into a system without having to actually write out the code. This comes only from writing code, reflecting, and revisiting it.

I talked about relational databases and modeling in data persistence, and it is essential to learn here. After that, read David Parnas’s work on modules and program families, probably a half dozen papers. There is a collection of his selected works entitled Software Fundamentals that is worth your time.

After that, it becomes a question of studying systems. Look at the various named architectures people have written about (model-view-controller, map-reduce, lambda, hexagonal, and many others). Analyze them in terms of module structures and data flow. These particular architectures are someone’s solution for a particular program family. What about that program family drove it? What limits on the program family make parts of it make sense? If you extend the family in some direction, does it stop making sense? And then study the organization of lots of actual programs.

Moving on to process, there is a large, useful literature on the design of systems of production. Start with Edward Deming’s Out of the Crisis (though ignore the section on intrinsic and extrinsic motivation). Read Goldratt’s work on constraints. Be cautious of how you apply what you read to software, though. Software and manufacturing are very different.

Read Donella Meadows’s Thinking in Systems and look into EM Forrester’s work on industrial dynamics. Once you have that basis, go look at the literature on agile, kanban, and lean. There is a lot of value there, but you need a basis to judge it. The material in the previous paragraph provides that background.

Sadly, effective systems are under continuous assault from those who either don’t understand them or are incentivized to break them. If you are in an organization where the system has already broken down and ossified, your best bet is to do what you are incentivized to do. Failing to do so prevents information on systemic failures from getting where it needs to go. The systems of political power in organizations determine whether you can defend and maintain a system or successfully operate in an ossified one. Pfeffer’s writing (The 7 Rules of Power and Leadership BS are good places to start) is probably the best information available on this.

As you study this, you will probably realize that how your employer organizes their software development is absurd, that your management does not know what it’s doing, and that they are cargo culting rituals that prevent the programmers from becoming more effective. The important thing to know is that you are not crazy.

There are also enormous amounts of nonsense out there about systems and development process and organization. Some of it is information that was good, but poorly repackaged and misinterpreted (see The Phoenix Project). Some of it is pure rubbish (Good to Great and the rest of Jim Collins’s work). Some of it is apologia to cover up an individual’s own failures and incompetence (anything by Jack Welch). This is a fraught area.

This series is still being written. Subscribe to get emailed when each is section is posted: