Misunderstood Maven

December 30, 2013

I finally learned to use Maven. I had looked at it several times, but always found the descriptions so confusing that I gave up and got on with other work. Now that I understand it, hopefully I can save someone else some time by describing its structure without resorting to undefined terms or analogies.

Most build systems in use today are descended from make. In make and its spawn (Ant, scons, rake, Cmake, etc.), a build is defined as a series of targets, each of which can depend on other targets. The targets are associated with a set of commands to reach that target, and the dependencies among targets form a directed, acyclic graph.

When using a make derived system, I begin a new project by copying and editing a build script from a previous project. There is a core of behavior which is unchanged. Most projects have targets named "compile", "test", "package", and "clean". Two Java programs will have similar commands in each of those targets, differing by adding a dependency here, changing a classpath there, but rarely differing in fundamental ways.

Could I describe these builds as differences from an underlying, simple Java build? I could, but describing such transformations on directed acyclic graphs requires sophisticated mathematical tools.

Maven does not use targets that form directed, acyclic graphs. Instead it uses sequences of labelled states, referred to as lifecycles. The labelled states are referred to as phases. So the Java program above might have two lifecycles, one with the states "compile", "test", and "package", and one with the single state "clean". A set of lifecycles, and the default behavior attached to them, is grouped in what is referred to as a "packaging".

The packaged lifecycles will not fit any but the simplest projects unchanged, but the transformations are simple: add a hunk of behavior to a labelled state or remove a hunk of behavior from a labelled state. This is equivalent to the hooks system in Emacs or the advice system in Common Lisp. Maven takes a lifecycle phase as an argument, and starts at the beginning of the lifecycle containing that phase, executing all the behavior associated with the first phase, then with the subsequent phase, and so on until it reaches the specified phase.

While you can remove behavior from a phase and replace it with your own, most variations in a build of a particular project type are either adding some behavior before or after an existing behavior, such as running a precompiler across a piece of source code before passing it off to a compiler; or changing the parameters to an existing behavior, such as setting whether to include debug symbols.

The former is handled in Maven by defining pre- and post- phases for each standard phase in a lifecyle. For example, the actual clean lifecycle in the Maven jar packaging has the phases "pre-clean", "clean", and "post-clean". The build lifecycle in the jar packaging has many more.

The latter is handled by having the behavior define parameters which can be set in the build definition.

If this were the extent of Maven's departure from the tradition of make it would be enough. However, there is a second, major departure.

In make and its spawn, you must manage your dependencies by hand. Libraries must be installed correctly, any custom behavior for your build system must be available, all things that package management systems, as epitomized by apt and the BSD ports tree, handle for systems administration. Programmers, traditionally, have been left to struggle by hand. The package management is added downstream of actual development.

Maven provides package management in the development process. Like apt, every version of every artifact has a globally unique ID. Maven's IDs are made up of a group ID, which has conventions similar to packages in Java; an artifact ID, which must be unique for that group ID; and a version. Your Maven install keeps a list of repositories to check for artifacts, like apt's list of sources. When faced with an artifact ID, whether it is a plugin providing behaviors for Maven or a library that the project depends on, it will fetch it in the way we are familiar with from package managers. All Maven's behavior besides executing lifecycles and fetching artifacts from repositories is defined by artifacts. That includes defining packagings. So given a project with a Maven build system and an install of Maven on your machine, you don't need to go search for dependencies to build the project. Maven does it for you.

That is the core of Maven: replacing DAGs with a mathematically tractable way to declare modifications to sequences of operations; and providing package management a la apt for the developer.


Did you enjoy that? Try one of my books:
Nonfiction Fiction
Into the Sciences Monologue: A Comedy of Telepathy