madhadron

Build systems

Status: Notes

Reliable, complete builds

I would have hoped that this would be obvious, but you should be able to build your project or any piece of it with one command, and that command should always work. Always.

Why buck/bazel

(TODO)

Maven’s good ideas

Most build systems in use today are descended from make. In make and its spawn (Ant, scons, rake, Cmake, etc.), a build is defined as a series of targets, each of which can depend on other targets. The targets are associated with a set of commands to reach that target, and the dependencies among targets form a directed, acyclic graph.

When using a make derived system, I begin a new project by copying and editing a build script from a previous project. There is a core of behavior which is unchanged. Most projects have targets named “compile”, “test”, “package”, and “clean”. Two Java programs will have similar commands in each of those targets, differing by adding a dependency here, changing a classpath there, but rarely differing in fundamental ways.

Could I describe these builds as differences from an underlying, simple Java build? I could, but describing deltas of a general DAG is complicated. Maven’s first good idea is replacing the DAG with something it’s easier to describe diffs of. Maven’s choice is sequences of labelled states (called “lifecycles” in Maven, and each state is called a “phase”). So the Java program above might have two lifecycles, one with the states “compile”, “test”, and “package”, and one with the single state “clean”. A set of lifecycles, and the default behavior attached to them, is grouped in what is referred to as a “packaging”.

The packaged lifecycles will not fit any but the simplest projects unchanged, but the transformations are simple: add a hunk of behavior to a labelled state or remove a hunk of behavior from a labelled state. This is equivalent to the hooks system in Emacs or the advice system in Common Lisp. Removing behavior is fairly unusual, but adding behavior is ubiquitous: running a precompiler across a piece of source code before passing it off to a compiler; or changing the parameters to an existing behavior, such as setting whether to include debug symbols.

Adding behavior is handled in Maven by defining pre- and post- phases for each standard phase in a lifecyle. For example, the actual clean lifecycle in the Maven jar packaging has the phases “pre-clean”, “clean”, and “post-clean”. The build lifecycle in the jar packaging has many more. Modifying existing behavior is handled by having the phases define parameters which can be set in the build definition.

Good idea 1. Replace a DAG of build targets with a set of linear sequences.

In make and its spawn, you must manage your dependencies by hand. Libraries must be installed correctly, any custom behavior for your build system must be available, all things that package management systems, as epitomized by apt and the BSD ports tree, handle for systems administration. Programmers, traditionally, have been left to struggle by hand. The package management is added downstream of actual development. Maven included a way of referring to packages stored elsewhere as dependencies and downloading them.

This may seem obvious today, but it really began with Maven. CPAN had pioneered having a central repository of modules for a language in the early 1990’s, but it was a way of managing modules that you had installed on a machine, not a way of fetching dependencies of a program. Today all Java build systems since Maven use Maven’s system for referring to and fetching dependencies, and other languages have imitated it. Plugins providing additional behaviors for Maven are also referred to in the same way.

Good idea 2. Provide a unique namespace of dependencies and let the build system refer to them.