madhadron

Dependencies: a modest proposal

Status: Finished
Confidence: Remote. Provocative thought experiment.

There was yet another fracas in the Node.js ecosystem with its mass of libraries. It’s not the first time. And there are a whole slew of similar issues that have cropped up in recent years. If you handle your dependencies carefully, you could have been relatively unaffected by this, but that’s a lot of work in our current environment.

So what would tooling that encouraged better hygiene look like? I propose four changes:

  1. No transitive dependencies.
  2. All versions are pinned.
  3. CVEs break your build.
  4. Exponentially increasing build times as you add libraries.

1. No transitive dependencies

You import dependency A. It pulls in dependencies B and C. Each of those pull in a few more. It’s easy to end up with hundreds of dependencies from a couple of lines. Each of the authors of those dependencies is making their job easier, and there is no back pressure on them to reduce their dependencies. We want to add some back pressure, and make it painful to use libraries with huge numbers of dependencies.

So instead of being able to list just A, you have to list A, B, C, and all the transitive dependencies explicitly. The build tool should tell you what the dependencies are and the acceptable version ranges for all of them, but you have to record them yourself. By hand. The maintainers of the build tooling should reject all automated generation of such lists.

2. All versions are pinned

We often see things dependencies like some-library:latest, which pulls whatever the latest version of the dependency is in some repository. We’ve seen multiple cases of this causing problems. No one should upgrade a dependency without testing it. Dependencies should be specified with exact versions.

We also need more than just a version. We need a signature. We have no way of knowing if someone has replaced the contents of a particular version. So we need not only the version, but a hash or fingerprint of the contents of the dependency.

All those dependencies we have to explicitly list should be listed with an exact version and a signature of the version’s contents.

3. CVEs break your build

Pinning your versions means that, if someone finds a bug or vulnerability, you need to explicitly upgrade. But no one wants to go out and check for bugs on every dependency. Instead, the repository you’re pulling from needs some way of annotating dependencies with vulnerabilities, bugs, and CVE’s that affect specific versions. And when you build, your build system goes and checks for any on the versions you’re using. If there are any, you can either explicitly annotate the dependency that you’re ignoring that item, or change the version and test with that.

4. Exponentially increasing build times

Finally, we need some straight up punishment for letting your dependency list get too long. The build system should add a sleep statement to the build. Have one dependency? Adds 50ms. Have two? 100ms. Three? 200ms. We keep doubling, and quickly your build becomes completely untenable. At 15 dependencies you’re already at around a day.

Possible objections

“This will make it harder for people to quickly start new projects!” This is the same trap that made MongoDB popular. MongoDB stripped out all the parts important for operating in production (schema enforcement, data migration, secure default values), which started lots of people using it, and left a mass of production problems in its wake. Proper dependency handling is part of the intrinsic complexity of a project, not a distraction to be shoved under the rug.

“Won’t this just make people add their stuff to a smaller number of big libraries?” Why, yes. And then you have a critical mass where you can have standards for acceptance and packaging and put effort into the logistics of those fewer libraries. Eventually stuff gets flowed into the standard library of the language and stops being a dependency at all, as the flow of stuff from Boost into the C++ standard library demonstrates.

“People will reinvent the wheel!” This is an obsession going back to the 1980’s when people dreamed of making programming into a low skilled task of plugging together reusable modules. It’s okay for people to write a left pad function or similar things as they need them. It often takes about the same time as finding and figuring out a new dependency.

“This raises the bar for people to contribute to open source.” It doesn’t really. It just reduces the blast radius an individual can have without doing the emotional labor of engaging with their community and environment. You can still publish a leftpad library, but no one will use it unless you do the work to make it part of some community standard library.

“No one would switch to a build system that does this.” This is true.