Get unstuck on your infrastructure choices

Status: Finished
Confidence: Very likely

We build systems to do interesting things. Figuring out where we’re hosting it, which distribution of Linux to use—or should we even run Linux?—and which tool to use to configure all these machines is not making interesting things happen. At the same time, what if we get this wrong? It can be disastrous.

So, time to roll up your sleeves and learn to use all the options in depth? Absolutely not. The things that go disastrously wrong show up with long experience, not when you’re learning to use it. And you don’t need the right choice, you just need one of the not wrong choices. Take which Linux distribution to use, for example. Once you have invested the (many) hours to learn the ins and outs of CentOS and Debian well enough to decide which is better for your situation, you will also have learned that the differences are negligible and you would have been equally well served by just learning one of them and getting on with life.

tl;dr: Save yourself a lot of time. Don’t read comparison articles or study which tool is the best. Choose one from each of the following three categories (each in alphabetical order):

Decide based on the following criteria:

  1. Has your company already standardized on one of these? Use what they do.
  2. Do you already have experience on one of them? Use what you know.
  3. Do you have a friend or colleague that knows one of them and who will help you? Use what they know.
  4. Pick one at random.

If you’re really stuck, here’s a tool to generate your choices:

That’s the tl;dr. Let’s get into the meat.

What if I join a company that uses a different tool than the one I learned?

Don’t worry about it. Once you know one tool in each category, learning another of them isn’t hard. It’s just not worth doing until you have to. If you learned Chef and you’re hired by a place that runs Puppet, you will have people there to help you learn it and learning it will be part of the expected ramp up time before you’re effective.

Why are you only recommending big cloud providers for hosting?

The fundamentals are still the same: buy a computer, install Linux on it, get it connected to the Internet, run your own DNS server to handle your DNS records, and deploy your software to it. Make a deal with a company somewhere to upload backups to them. Or you rent a virtual machine from someone and get someone to host your DNS for you. You pay a little more for them to do so, but unless you have economies of scale, you more than make up for it in not having to worry about fixing and replacing hardware, not trying to remember how your DNS server works when you have to mess with it every few months.

Even simpler is having one company provide all of it. Your time is expensive, and every piece of logistical and operational burden that you can slough off for a small amount of money is worth it. Later you may have economies of scale yourself that will make it cheaper to shop around for hosting or run your own physical machines. That’s not where you start, though.

Why these three Linux distributions?

What’s special about CentOS, Debian, and OpenSuSE? Why not Ubuntu? Or Arch or Slackware? Or something else entirely, like FreeBSD or Illumos? Here’s what I’m looking for:

Criteria for choosing a Linux distribution:

  1. A long (>20 year) record of stable releases.
  2. A release life cycle that includes kernel and security updates for at least 3 years.
  3. A big enough community where you’re unlikely to be the first person to encounter a problem.

CentOS: CentOS is the free version of Red Hat Enterprise Linux. I first ran it in production (under its old name of Red Hat Linux) in 1996. Releases are supported for eight to ten years. If you’re in North America, it’s ubiquitous. The community is large, the documentation is meticulous, and if you really need to, you can pay the money to turn it into Red Hat Enteprise Linux and have enterprise support.

Debian: I first ran Debian in production around the same time as Red Hat. They have several branches going at all times: unstable, testing, and stable. A release begins as the unstable branch, gets frozen as the testing branch, and then becomes the stable branch. Meanwhile more releases are wending their way through behind. Debian stable has been boringly stable for over twenty five years. Debian releases are supported for 3 years, but there’s a group now doing backporting for five.

OpenSuSE: OpenSuSE is the free version of SuSE Enterprise Linux. It’s slightly older than Red Hat Linux and Debian (1992 instead of 1995 and 1993), and most common in Europe. Again, decades of stable releases. If you’re willing to pay them, SuSE will support their releases for 13 years. If you’re not paying, you get three years.

For comparison, consider these three:

Arch: Arch is solid, and its wiki is an amazing resource for the community. It’s also young by the standards of this list. It began in 2002. There were ten year old production installations of Debian and SuSE when Arch was still one person’s hobby project. Give it a few years. More importantly, Arch expects you to follow a rolling release. That breaks criterion (2) above. Rolling releases are great when you have a laptop and two servers. When you have a laptop and ten thousand servers, they’re not.

Slackware: Slackware has been around as long as Debian and it’s utterly solid. It’s also a tiny, shrinking community. Leave it to the hobbyists who have spent twenty years with it. FreeBSD is likewise a wonderful system with a small community. Illumos (née Solaris) is all but dead.

Ubuntu: Linux users flocked to Ubuntu in the early 2000’s when it was one of the first distros to provide decent hardware detection. In the years leading up to that, many of us were installing the Knoppix LiveCD distribution to hard disks to avoid having to configure our X servers by hand. There are two big problems with Ubuntu, though. First, they from time to time put out shoddy releases. Second, they often replace pieces of the system with their own projects for a couple of releases, then switch to what the rest of the community is doing, forcing you to rewrite your configurations several extra times for no good reason. Since they pull their packages and infrastructure from Debian anyway, you may as well go upstream and run Debian.

Why a configuration manager instead of Kubernetes?

Two reasons:

  1. You have to configure the hosts that are running Kubernetes. For that, you need a configuration manager.
  2. Kubernetes excels for stateless services. Unfortunately, the interesting parts of software are all about state. You don’t casually kill and start containers for your database. Again, you need the configuration manager.

So you’re going to run a configuration manager. The question, then, is whether you need Kubernetes as well?

Why these three configuration managers? Why not Ansible or something else?

There are lots of configuration managers out there. It’s a common project for someone to write their own.

Criteria for choosing a configuration manager:

  1. Windows client support, because inevitably you’re going to need to at some point, and it’s not simple to add.
  2. Pull model.
  3. A big enough community where you’re unlikely to be the first person to encounter a problem.

What do I mean by pull model? Compare it with Ansible’s push model: you run a command, and it ssh’s into lots of machines, making changes on each. The updates only happen when you run the command. The systems I’ve listed are all a pull model. You have a server somewhere that holds the desired state. Each machine connects to it on a regular basis, gets the current desired state, and brings itself into line. When you’re configuring two or three machines it doesn’t matter which you use. When you end up with thousands, you need the pull model.

Puppet and Chef have big communities and established practice. As of 2019, SaltStack just squeaks over that line, but it’s growing fast.

What about…

You can split hairs about this all day. I’m not offering perfect answers or the only answers. These are just answers that are not wrong enough to matter in practice. Five years from now you’ll look back and say, “Meh, that was good enough,” and go back to doing interesting things.

So jump back to the tl;dr and get started.